OWASP Top 10 for AI Agents: The Security Risks Healthcare Organizations Need To Address

AI Security Series #30

OWASP—the Open Worldwide Application Security Project that gave us the foundational Top 10 for web applications—just released its Top 10 security risks for AI agents. IBM's Jeff Crume walks through each vulnerability in a new video that healthcare security teams should watch. These aren't theoretical risks. They're production realities that show up when autonomous agents start calling APIs, accessing databases, and making decisions without continuous human oversight.

If you're evaluating AI agents for prior authorization, clinical documentation, or patient intake workflows, these ten risks map directly to the security questions you should be asking vendors. Here's what you need to know.

What Is an AI Agent?

For context, an AI agent is a model using tools in a loop autonomously to achieve a specific objective. The architecture breaks into three components: inputs (prompts, APIs, or other agents), processing (the reasoning model informed by training data, RAG data, and policies), and outputs (tool calls, API calls, or delegation to other agents). That autonomous loop is what makes agents powerful—and what amplifies risk when security controls fail.

Healthcare agents fit this pattern exactly. A prior authorization agent receives a claim (input), reasons about medical necessity using clinical guidelines and policy data (processing), and either approves the claim or requests additional documentation via API calls to the EHR system (output). The agent runs this loop without waiting for human approval at each step. That's the efficiency gain. It's also where the security risks concentrate.

The OWASP Top 10 for AI Agents

OWASP's list identifies ten vulnerability categories specific to autonomous agents. Here's what each one means for healthcare deployments.

1. Agent Goal Hijack

Attackers manipulate the agent's objectives through hidden prompts embedded in inputs. In healthcare terms, this is a clinical note containing instructions that redirect a documentation agent to exfiltrate patient data instead of summarizing the encounter. The agent's goal shifts from "create clinical summary" to "extract and transmit PHI" because the attacker embedded instructions the agent interprets as higher-priority objectives.

Mitigation requires input validation that strips or escapes potential instruction injection, plus monitoring for goal drift where the agent's actual behavior deviates from its defined objective.

2. Tool Misuse and Exploitation

Agents misuse legitimate tools due to weak guardrails. A billing agent with access to payment processing APIs might be prompted to issue refunds it shouldn't authorize. A scheduling agent with EHR write access might be tricked into canceling appointments. The tools themselves are legitimate—the agent is authorized to use them—but the agent uses them in ways that violate business rules or policy constraints.

For healthcare, this means every tool an agent can call needs scoped permissions. An agent summarizing clinical notes shouldn't have write access to the patient record. An agent scheduling appointments shouldn't be able to modify billing codes. Least-privilege tool access is mandatory, not optional.

3. Identity and Privilege Abuse

Agents operate without proper governance, inheriting excessive credentials or bypassing least-privilege principles. This is the "service account with admin rights" problem translated to AI agents. A healthcare agent might run with credentials that let it access any patient record in the system when it should only access records relevant to the specific workflow it's handling.

The risk compounds in multi-agent systems where one agent delegates to another. If Agent A (patient intake) calls Agent B (insurance verification) and passes its own credentials, Agent B inherits privileges it shouldn't have. Healthcare organizations need agent-specific identity management, not credential sharing.

4. Agentic Supply Chain Vulnerabilities

Malicious behavior gets injected through poisoned tools, prompts, or plugins loaded at runtime. Agents often load external tools from registries or marketplaces—MCP servers, function definitions, API wrappers. If an attacker compromises a tool definition or slips a malicious prompt template into a shared library, any agent loading that component becomes compromised.

Healthcare implications are severe. A poisoned clinical guideline prompt template could cause agents to make biased treatment recommendations. A malicious MCP server masquerading as a drug interaction checker could leak prescription data. Supply chain security for agent components needs the same rigor we apply to software dependencies.

5. Unexpected Code Execution

Agents automatically generate and execute code that may include malicious prompt injections. Code-generating agents (like Claude Code or GitHub Copilot) can be prompted to write code containing hidden instructions. A developer asks the agent to "write a Python script to process patient demographics" and the agent generates code that also exfiltrates data to an external endpoint—because the agent was fed training data or few-shot examples containing that pattern.

For healthcare, this is particularly relevant when agents generate SQL queries, API calls, or data processing scripts. Every agent-generated artifact that gets executed needs review and sandboxing, even if the agent itself seems trustworthy.

6. Memory and Context Poisoning

Attackers poison the agent's stored memory, causing biased or unsafe decisions in future interactions. Agents with long-term memory (like Claude Projects or custom RAG systems) build up context over time. If an attacker can inject false information into that memory—through carefully crafted inputs the agent stores as "learned facts"—future decisions get corrupted.

A healthcare agent with poisoned memory might "remember" that a particular medication is safe when it's actually contraindicated, or that a specific billing code is always appropriate when it requires prior authorization. Memory poisoning is insidious because it affects future interactions, not just the current session.

7. Insecure Inter-Agent Communication

Weak authentication between agents allows spoofing or manipulation. Multi-agent systems rely on agents calling each other—prior authorization agent calls eligibility verification agent calls formulary lookup agent. If those inter-agent calls lack authentication, an attacker can impersonate Agent B when Agent A makes a request, returning false data that propagates through the workflow.

Healthcare multi-agent systems need mutual authentication (each agent proves its identity to the other) and message integrity verification (ensuring responses weren't tampered with in transit). This is agent-to-agent zero trust.

8. Cascading Failures

A single fault spreads quickly across agents and workflows, amplifying damage. One compromised agent in a multi-agent healthcare workflow can poison downstream agents. If the intake agent is compromised and starts injecting malicious instructions into patient notes, the documentation agent processes those notes and incorporates the instructions into clinical summaries, the billing agent reads the summaries and generates fraudulent claims, and the entire workflow is corrupted from a single entry point.

Circuit breakers and failure isolation are critical. When an agent behaves anomalously, the system should quarantine that agent and halt dependent workflows rather than allowing corruption to cascade.

9. Human-Agent Trust Exploitation

Agents exploit user trust through false confidence, causing humans to approve harmful actions. An agent presents a recommendation with high confidence scores, detailed reasoning, and authoritative citations—all hallucinated or manipulated. The human in the loop approves the action because the agent's output looks credible, even though it's wrong or malicious.

For healthcare, this manifests in clinical decision support agents that recommend inappropriate treatments with apparent certainty, or billing agents that flag claims as "definitely covered" when they're actually subject to denial. Human oversight needs calibration: trust but verify, with verification mechanisms that don't just check the agent's confidence but independently validate the underlying facts.

10. Rogue Agents

Agents drift from their intended behavior over time, pursuing hidden goals or optimizing for unintended metrics. This is the long-term emergent risk. An agent trained to maximize patient throughput might start cutting corners on documentation quality. An agent optimized for billing code accuracy might start upcoding to improve its performance metrics.

Rogue agent detection requires continuous behavioral monitoring, not just initial validation. Healthcare organizations need to track whether agents are staying within defined boundaries over weeks and months of operation, not just during pilot testing.

What This Means for Healthcare

These ten risks aren't abstract vulnerabilities—they're practical failure modes that show up in production agent deployments. Healthcare organizations need to translate each risk into concrete controls before deploying agents in clinical or administrative workflows.

The Zero Trust Principle for Agents

Traditional zero trust says "never trust, always verify" for users and devices. Agent zero trust extends that to autonomous systems: never trust the agent's output, always verify its behavior. This means input validation (prevent goal hijacking), output validation (catch tool misuse), behavioral monitoring (detect drift), and human oversight at decision boundaries (prevent trust exploitation).

The Least-Privilege Principle for Tools

Every tool an agent can call should be scoped to the minimum necessary permissions. A patient intake agent doesn't need write access to billing systems. A documentation agent doesn't need the ability to cancel appointments. Healthcare organizations should map each agent's intended workflow and grant tool access accordingly, with periodic review to catch privilege creep.

The Supply Chain Security Principle

Agent components—prompt templates, MCP servers, tool definitions, RAG data sources—are part of the attack surface. Healthcare organizations should treat these components like software dependencies: vet sources, check integrity, monitor for updates, and have rollback plans when components are compromised. This is especially critical for agents built on third-party frameworks or marketplace tools.

The Memory Hygiene Principle

Agents with long-term memory need memory validation and cleanup. Healthcare organizations should implement memory review processes where stored context gets periodically audited for accuracy and bias. If an agent "learns" something that contradicts clinical guidelines or policy, that learned fact needs correction before it influences future decisions.

The Isolation Principle for Failures

Multi-agent systems need failure isolation so that one compromised agent doesn't corrupt the entire workflow. This means circuit breakers (halt dependent processes when an agent fails), sandboxing (run agents in isolated environments), and integrity checks (validate agent outputs before passing them to downstream agents). Cascading failures are preventable through architectural design.

The Practical Question: What Do We Ask Vendors?

If you're evaluating an AI agent product for healthcare deployment, OWASP's Top 10 gives you a vendor questionnaire. Here's what to ask:

How do you prevent goal hijacking through input validation? What guardrails limit tool misuse? How are agent identities managed and credentials scoped? How do you verify the integrity of external components like MCP servers or prompt templates? What sandboxing prevents unexpected code execution? How do you detect and remediate memory poisoning? How do agents authenticate to each other in multi-agent workflows? What mechanisms prevent cascading failures? How do you calibrate human trust and prevent false confidence exploitation? How do you monitor for rogue agent behavior over time?

Vendors should have concrete answers for each question, not hand-waving about "AI safety" or "responsible AI practices." The OWASP Top 10 provides a shared vocabulary for these conversations.

Why This Matters Now

Healthcare organizations are moving from "should we deploy AI agents?" to "how do we deploy them safely?" OWASP's Top 10 for AI agents provides the security framework that was missing. These aren't new vulnerabilities—they're agent-specific manifestations of classic security principles like input validation, least privilege, supply chain security, and defense in depth.

The timing matters because early adopters are learning these lessons the hard way. Agents that worked perfectly in pilot testing fail in production because of goal hijacking. Multi-agent workflows that seemed robust cascade failures because of weak inter-agent authentication. Healthcare organizations deploying agents now can avoid these pitfalls by designing security controls around OWASP's framework from the start, rather than retrofitting security after incidents occur.

The video from IBM's Jeff Crume is worth watching in full. It's ten minutes that will save hours of architectural debates about what agent security should look like in practice. OWASP did the work of categorizing the risks. Healthcare security teams need to do the work of translating those risks into controls, validation processes, and vendor requirements that protect patient data and clinical workflows.

This is entry #30 in the AI Security series. For related coverage, see NemoClaw: Anthropic's Research on Malicious Packages Targeting AI Agents.