The security problem with AI agents

The big picture ✨

_{Image from Wanan Wanan on Shutterstock.}

AI agents are rapidly becoming part of real engineering workflows.

Tools like Cursor, Claude Code, and GitHub Copilot are evolving from coding assistants into systems that actively execute complex tasks.

But while the industry focuses heavily on productivity gains, a major risk is being overlooked.

AI agents are privileged automation systems. They often operate with the same credentials and access rights as the developer using them. If compromised or manipulated, they can read private code, leak secrets, modify repositories, or run arbitrary commands.

Over the past year alone, multiple high-severity vulnerabilities have been disclosed across widely used AI developer tools, including Cursor, Claude Code, Copilot, and OpenAI’s Codex CLI. These incidents expose a consistent pattern: agents are easy to manipulate when they interact with untrusted inputs.

The problem is not theoretical. And it is not limited to experimental tools. It is emerging across mainstream development environments.

Recent incidents ⚠️

Several incidents over the past year show that agent vulnerabilities are not hypothetical. They are already appearing across mainstream developer tools.

OpenClaw agent takeover
Oasis Security demonstrated the ClawJacked attack chain, where a malicious website could connect to a developer’s local OpenClaw gateway and take over the agent. Other vulnerabilities allowed gateway token theft and remote command execution.

GitHub Copilot prompt-injection attacks
Malicious GitHub issues could inject prompts into Copilot in Codespaces and leak GITHUB_TOKEN. Separately, CamoLeak vulnerability was discovered, where hidden Markdown instructions could cause Copilot Chat to exfiltrate private repository data.

Claude Code repository compromise
Security researchers disclosed vulnerabilities where opening a malicious repository in Claude Code could execute commands or leak API keys through repository-controlled configuration files.

OpenAI Codex CLI configuration exploit
Malicious project configuration files could cause Codex CLI to execute attacker-controlled MCP servers automatically when the project was opened.

Replit agent deletes production database
During a development experiment, a Replit AI coding agent accidentally deleted a production database, prompting the company to introduce stricter safeguards around development and production separation

Common attack pattern

Despite the variety of tools involved, most AI agent vulnerabilities follow a similar chain:

untrusted input → prompt injection → agent misuse → data exfiltration or code execution

Attackers often embed instructions in places that agents read automatically:

GitHub issues
pull requests
documentation
code comments
configuration files

Because language models cannot reliably distinguish between data and instructions, agents may execute malicious commands embedded in otherwise normal-looking content.

Once an agent executes those commands using its privileged access, the attacker gains control over the system.

Why agent security is fundamentally harder 🔒

_{Pluto Security}

Traditional development tools assume that humans are making the decisions. Agents change that assumption.

Agents run with developer privileges. A lot of agents operate using the developer’s credentials. If manipulated, the agent effectively becomes a remote actor with access to internal systems.

Repository content becomes executable input. Configuration files, scripts, and documentation can become execution paths when agents automatically parse and act on them.

Plugin ecosystems introduce supply-chain risk. Agent platforms increasingly rely on plugins, MCP servers, or extension marketplaces. These ecosystems create new attack surfaces similar to browser extensions or package registries.

What this means for the industry 🔮

The development ecosystem is moving toward agent-driven workflows.

Agents will increasingly write code, debug failures, maintain CI/CD pipelines, update documentation, and much more. But this automation dramatically expands the attack surface.

Every AI agent is effectively capable of interacting with infrastructure, credentials, and internal codebases.

Organizations that treat agents like simple chatbots will expose themselves to serious security risks. Instead, agents must be treated like deploy bots or CI/CD systems: high-privilege automation that requires strong controls.

How agents should be built and deployed safely 🛡️

Of course, the safest way to deploy AI agents is to keep them in a sandbox with limited permissions. But that defeats the purpose. Most companies adopting agent workflows are aiming to run them in production with write access. So the real question is not how to prevent agents from touching production. It’s how to make production-safe automation possible.

Limit blast radius. Agents will need write permissions, but they shouldn’t control the entire system. Instead of broad access, restrict agents to specific services, repositories, or environments. An agent might modify application code but not IAM policies, update feature flags but not billing logic, or patch a single service without touching shared infrastructure.

Enforce deterministic checks. Agents should not enforce rules through prompts alone. High-risk actions should pass through deterministic systems such as policy engines, test gates, secret scanners, or deployment rules. The agent can propose or execute a change, but automated controls determine whether the action is allowed.

Make actions reversible. Production automation will inevitably make mistakes. Every change made by an agent should be versioned and reversible. Deployments should support automated rollbacks, configuration changes should maintain previous states, and infrastructure updates should have clear revert paths.

Provide system-level context. Many agent failures happen because the agent lacks awareness of the system it operates in. Agents often understand code but not service dependencies, infrastructure resources, or deployment pipelines. Providing structured system context allows agents to reason about how changes affect the broader system, reducing the likelihood of breaking production.

The takeaway 🧐

AI agents are quickly becoming a core component of modern developer workflows. But as these systems gain the ability to execute commands, modify repositories, and access infrastructure, they also introduce a new class of security risks.

The past year has shown that agent vulnerabilities are not isolated edge cases. They are appearing across multiple tools, ecosystems, and workflows.

The long-term solution is not banning specific tools. It is recognizing that AI agents are privileged automation systems. And this automation must be monitored and secured accordingly.

Want to Get Featured?

Want to share your dev tool, research drops, or hot takes?

Submit your story here - we review every submission and highlight the best in future issues!

Till next time,

Future of DevEx

Share the newsletter

The security problem with AI agents

The big picture ✨

Recent incidents ⚠️

Common attack pattern

Why agent security is fundamentally harder 🔒

What this means for the industry 🔮

How agents should be built and deployed safely 🛡️

The takeaway 🧐

Want to Get Featured?

Future of DevEx

Keep Reading

Future of DevEx