Securing AI Agents: From Theory to Reality

If you needed a case study in why AI agent security matters, January 2026 delivered one. ClawdBot—an open-source AI assistant that could integrate with messaging platforms and execute system commands—went from launch to 100,000 GitHub stars in about three weeks. It also went from launch to thousands of exposed instances leaking API keys, OAuth credentials, and chat histories in about the same timeframe.

Security researchers found unauthenticated admin panels accessible from the internet. One demonstrated a prompt injection attack via email that exfiltrated user data in five minutes. Another uploaded a malicious skill to the project's GitHub repo and 16 developers across 7 countries downloaded it within 8 hours. Enterprise security firms reported that 22% of their customers had employees running ClawdBot without IT approval.

This wasn't a failure of the technology itself. It was a preview of what happens when autonomous AI agents meet real-world deployment without adequate security and governance frameworks. And it's exactly what IBM's latest guidance on Agentic AI is trying to help organizations avoid.

What Makes Agents Different?

An AI agent is an autonomous system that combines tools, data, and logic to execute tasks end-to-end—without step-by-step human direction. Unlike a chatbot that responds to prompts, an agent can schedule meetings, process transactions, deploy code, or execute workflows across multiple systems on its own.

Gartner predicts that by 2028, one-third of enterprise applications will incorporate agentic AI. That's a massive expansion of autonomous systems making decisions and taking actions inside organizational boundaries.

What makes agents a distinct security challenge:

  • They act, not just advise. Agents can trigger changes, execute actions, and move data across systems without manual approval for each step.
  • They cross boundaries. One agent workflow can span multiple clouds, applications, and data sources—making lateral movement easier if compromised.
  • They hold powerful keys. Agents often have broad or persistent permissions that, if exploited, can be abused at scale.

IBM's Jeff Crume and Josh Spurgin dive in to the topic:

The Threat Landscape

IBM's recent guidance outlines several categories of attacks targeting AI agents:
  • Hijacking: Attackers take control of an agent to make it operate on their behalf.
  • Prompt Injection: The number one attack type. Unauthorized commands are inserted to make the AI perform unintended actions. The ClawdBot email attack was exactly this—a malicious prompt embedded in an email that the agent processed.
  • Infection: AI models can be infected with malware or malicious code, similar to traditional software.
  • Data Poisoning: Subtle modifications to training data can have devastating effects on the model's behavior downstream.
  • Evasion Attacks: Manipulating inputs to confuse the AI and produce incorrect results.
  • Extraction: Attackers harvest the model itself or extract sensitive information it has processed.
  • Zero-Click Attacks: Data exfiltration without any user interaction—the agent processes malicious content automatically.
  • Denial of Service: Overwhelming the agent with requests to make it unavailable.

The Governance Gap

Beyond technical attacks, agents introduce governance challenges that traditional IT frameworks weren't designed to address:

  • Autonomy vs. Oversight: When should an autonomous agent require human approval before acting? The answer isn't always obvious.
  • Transparency and Explainability: Understanding why an agent made a particular decision can be difficult, especially with complex multi-step workflows.
  • Bias: Agents trained on biased data can perpetuate discrimination in automated decisions.
  • Accountability: When an agent makes a harmful decision, who is responsible? The developer? The deploying organization? The user who initiated the workflow?

Recommended Safeguards

IBM's guidance recommends a layered approach to securing AI agents:
  1. Discover all AI instances. You can't secure what you don't know exists. This includes shadow AI—tools employees are using without IT approval. (Remember: 22% of enterprises had unauthorized ClawdBot deployments.)
  2. Implement AI Security Posture Management. Ensure AI systems comply with organizational security policies: encryption, authentication, access controls, and configuration baselines.
  3. Conduct penetration testing. Test agents with adversarial inputs—including prompt injections—to verify they reject improper requests.
  4. Deploy AI-specific firewalls. A protection layer between users and the AI that examines incoming prompts and outgoing responses for malicious content or data leakage
  5. Establish governance pillars. Lifecycle governance (approving agents from inception to production), risk and regulation compliance, and continuous monitoring and evaluation.

The key insight from IBM is security and governance are interdependent. Governance without security is fragile—your rules collapse if the model is compromised or data is poisoned. Security without governance is blind—you're protecting a system that may be biased, lack oversight, or produce unexplainable decisions.

Practitioner Notes

AI agents are coming to healthcare whether we're ready or not. Clinical documentation assistants, prior authorization automation, patient scheduling agents, diagnostic support workflows—the use cases are multiplying. Here's what this means for healthcare security teams:

  • The 72-hour window is real. ClawdBot went from security disclosure to widespread exploitation in under a week. Healthcare's traditional patch and review cycles aren'tt built for this velocity. Agent security incidents will require faster response playbooks.
  • Shadow AI is a HIPAA problem. When clinical staff experiment with AI tools on personal devices or unapproved services, PHI follows. Your AI discovery process needs to find these deployments before an auditor does.
  • Prompt injection meets clinical workflows. Any agent that processes external content—patient messages, faxed records, insurance responses—is a potential prompt injection target. Validate inputs before they reach the agent.
  • Permissions need clinical context. An agent that can access the EHR needs role-based constraints that mirror clinician access patterns. Broad permissions become HIPAA minimum necessary violations at scale.
  • Audit trails are non-negotiable. When an agent accesses PHI, you need defensible documentation of what it did and why. Current tools often don't provide this out of the box.

Want to Learn More?

We're actively building out the Learning Center with more AI security content. In the meantime: