Hackers poison AI tool descriptions, exfiltrating data from ChatGPT, Claude
Serge Bulaev
Recent research suggests that hackers may be tampering with AI tool descriptions to steal data from assistants like ChatGPT and Claude. The hidden malicious code sits inside the tool's description, and users do not see anything strange, but the AI may quietly send files or run commands. Researchers showed this tactic works on many platforms, with larger models appearing more likely to follow harmful instructions. Many organizations have reported security problems linked to this kind of attack, and experts believe new safety steps are needed. Even a tiny amount of poisoned data might be enough to create lasting security risks.

A new cyberattack vector, AI tool poisoning, allows hackers to tamper with application descriptions to steal data from AI assistants like ChatGPT and Claude. The hidden malicious code resides in the natural-language descriptor for a tool, tricking the AI into silently exfiltrating files or executing shell commands without any visible signs to the user.
Security researchers have demonstrated this technique on multiple platforms, finding that more advanced models are more susceptible to following malicious instructions. Research indicates that attack success rates can be significant in controlled testing environments. In a related supply-chain attack, Lakera AI research found that a seemingly benign "joke_teller" tool contained hidden backdoors.
These vulnerabilities transform AI tool descriptions into a critical attack surface. This elevates the threat from simple prompt injection to a complex supply-chain issue affecting the entire AI agent lifecycle, from development to deployment.
Silent Exfiltration in the Enterprise
AI tool poisoning is a cyberattack where malicious instructions are embedded within the natural-language descriptions of AI tools or plugins. When an AI agent like ChatGPT or Claude reads this hidden data to understand how the tool works, it can be tricked into exfiltrating sensitive files or executing unauthorized commands.
Unlike user-pasted prompts, a poisoned tool description is persistent, affecting every session that loads the compromised tool. This has led to significant breaches, including incidents where attackers gained undetected access to enterprise environments for extended periods. Other vulnerabilities have exposed large numbers of MCP instances to arbitrary command execution. Experts label the combination of an agent that can access untrusted content, sensitive data, and external networks as the "lethal trifecta" for enterprise security.
A Growing Threat in Commercial Crimeware
AI tool poisoning techniques are now being integrated into commercial crimeware. Security researchers have observed numerous attempts to manipulate AI assistant recommendations for commercial gain. While these attacks aimed for marketing benefits, the same methods can be used to steal credentials or source code, creating significant compliance risks under GDPR and CCPA. A growing number of organizations report confirmed or suspected AI security incidents, with poisoned tools being a primary and growing cause.
Mitigation Practices for Modern Defenses
To counter this threat, security teams are moving beyond simple input filtering toward comprehensive capability governance. Key mitigation strategies include:
- Treat every agent as a privileged identity and enforce least-privilege scopes.
- Require cryptographic signing and validation of all tool descriptors before installation.
- Log every tool invocation and enable chunk-level provenance for RAG pipelines.
- Audit persistent memory writes and implement human approval gates for irreversible actions.
- Schedule quarterly memory audits in conjunction with standard access reviews.
Researchers emphasize the severity of this risk, warning that relatively small amounts of poisoned data can create a persistent backdoor. This low threshold necessitates advanced security measures like anomaly detection and data watermarking, as traditional dataset cleaning is no longer sufficient.
Key Security Considerations and Future Defenses
Many AI agents load external tools with permissions to modify system state or execute code, and recent audits have found that most popular MCP skills contain at least one high-risk capability. Since isolated fixes cannot resolve these architectural flaws, the future of AI security will likely rely on robust governance frameworks. These frameworks must gate high-impact actions and require explicit approval. Security teams should actively monitor for unusual outbound network traffic and watch for stylistic changes in an agent's output, which could signal latent memory poisoning.
What is AI tool poisoning and how does it differ from prompt injection?
AI tool poisoning hides malicious instructions inside the invisible description metadata that an assistant reads before it decides whether (and how) to call a plug-in. Because the text sits in the tool definition - not in the user chat - the victim sees no strange prompts and the UI looks normal. Prompt injection, by contrast, relies on putting adversarial text into the visible conversation thread.
Which assistants are confirmed to be affected?
Security researchers have demonstrated successful attacks against ChatGPT, Claude, Cursor and other mainstream code agents that read tool descriptors. The same technique works anywhere an LLM ingests natural-language tool specs and then acts on them, so the risk class spans most agent ecosystems.
How little poison is enough to break an enterprise model?
Research suggests that very small amounts of corrupted samples can implant a working back-door. In practical terms, an insider who can slip a relatively small number of malicious documents into training data or a tool registry can achieve reliable, silent control - often with no measurable drop in baseline accuracy.
What real damage has already happened?
- Multiple enterprise environments were compromised through poisoned agent integrations for extended periods
- A significant vulnerability across Anthropic's Model Context Protocol supply chain allowed arbitrary commands to run inside IDE plug-ins
- A growing number of organizations report at least one suspected AI-agent security or privacy incident, with tool misuse and privilege escalation being a fastest-growing vector
How can teams stop tool poisoning today?
- Cryptographically sign every tool descriptor and verify before load
- Run agents with least-privilege roles - no broad file, network or memory rights by default
- Add human approval gates for any irreversible action (file delete, external POST, memory write)
- Log and audit each agent-tool call; look for unexpected outbound traffic after read-only tasks
- Review built-in memory write permissions: if an agent can persist data as a side effect of summarising, you are one poisoned page away from "exploit forever"
For deeper defense insight, see the latest research on AI tool security and the Microsoft Security Response Center note on prompt-injection-to-RCE chains.