Claude Mythos and Project Glasswing: How Anthropic Is Managing Dangerous AI Capabilities Before Public Release

AI Security Series #38

Anthropic announced on April 7, 2026, that it had developed Claude Mythos, described internally as "by far the most powerful AI model we've ever developed," with striking capabilities in cybersecurity, coding, and academic reasoning. Rather than releasing Mythos immediately for general public access, Anthropic made a deliberate choice to restrict the model and launch Project Glasswing, a defensive cybersecurity initiative giving exclusive access to over 40 partner organizations including AWS, Apple, Google, Microsoft, and JPMorgan Chase. The restricted deployment allowed Anthropic to harness Mythos's dangerous capabilities—specifically its ability to automatically develop functional cyberattacks at professional level—in a controlled context where the discoveries directly strengthen global digital infrastructure rather than enabling new attack vectors. In the five weeks since Project Glasswing launched, Mythos has identified 1,596 vulnerabilities across 281 open-source projects, with 97 already patched. Now, in late May 2026, references to `claude-mythos-1-preview` appearing in public versions of Claude Code and Claude Security before being pulled offline indicate that Anthropic is preparing a staged public release with guardrails designed to extend the Glasswing model to developers and security teams. For healthcare organizations and security analysts evaluating how to deploy powerful AI capabilities responsibly, Anthropic's approach offers both a template and a cautionary case study in the tensions between capability advancement, safety governance, and market pressure.

The emergence of Claude Mythos represents a inflection point in AI development where capability advancement has outpaced safety deployment patterns. Previous Anthropic models evolved incrementally, with Opus 4.7 released in April followed by Haiku 5.5 and Sonnet 4.6 throughout the year. Mythos represents a discontinuous leap, a "step change" in capabilities that Anthropic assessed as too dangerous for uncontrolled public release. The decision to restrict Mythos reflects organizational judgment that some capabilities should be deployed differently than consumer products. However, the decision also acknowledges that restricting powerful capabilities indefinitely is neither technically feasible nor strategically sound. Other AI companies are developing equivalent or superior models. The window for defenders to prepare for Mythos-class capabilities is narrow. Anthropic's response—restricting until guardrails are proven, then staged release—maps to responsible capability deployment in adversarial contexts like cybersecurity. Healthcare organizations facing similar decisions about deploying powerful AI systems should understand both why Anthropic restricted Mythos and how its phased release strategy attempts to resolve the tension between safety and capability.

How Claude Mythos Was Discovered and Leaked

Claude Mythos first surfaced publicly in late March 2026 when an accidental leak from Anthropic's content management system exposed close to 3,000 unpublished internal assets, including a draft blog post describing Mythos as the company's most powerful model. The leak itself is instructive because it demonstrates that even capability restriction strategies depend on operational security, and that information about restricted models can escape organizational control through infrastructure misconfigurations or insider access rather than deliberate disclosure. For Anthropic, the leak forced earlier public acknowledgment of Mythos than planned, but the company's rapid response on April 7 with Project Glasswing showed that the restriction strategy was already in place before the leak occurred. The accidental disclosure did not cause the restriction decision. It revealed that the restriction decision had already been made and executed.

The operational detail matters because it illustrates why capability governance in AI companies requires both technical restrictions and organizational process controls. Anthropic could have locked down access to Mythos through API-level restrictions, preventing unauthorized access by users. The leak demonstrates that information about the model's existence escaped without users necessarily having runtime access, suggesting that capability restriction must extend to organizational knowledge management, access logs, documentation systems, and insider threat controls. For healthcare IT and security teams evaluating how to manage dangerous capabilities like advanced diagnostic AI, clinical decision support with potential liability exposure, or infrastructure automation that could affect patient safety, this lesson applies directly. Restricting capability access is necessary but insufficient without managing organizational knowledge about what capabilities exist, who knows they exist, and what information flows through internal systems.

Project Glasswing: Defensive Capability Deployment

Rather than restricting Mythos indefinitely, Anthropic structured a phased release strategy centered on Project Glasswing, announced April 7, 2026. Glasswing gave exclusive early access to over 40 partner organizations selected for their role in critical infrastructure, security research, or defensive cybersecurity. The partnership model inverted the traditional AI release paradigm where products are made available to users and users find applications. Instead, Glasswing paired the dangerous capability with a specific use case—hardening widely deployed open-source software—and partner organizations focused the model's power on that defensive task.

The results validate the strategy. In five weeks of Project Glasswing operations, Mythos identified 1,596 vulnerabilities across 281 open-source projects. As of late May 2026, 97 of those vulnerabilities have been patched. The scale is significant because popular open-source projects like Firefox, OpenSSL, and others are components of critical infrastructure across healthcare, finance, government, and technology sectors. A vulnerability in Firefox that goes unpatched affects hundreds of millions of users. A vulnerability in OpenSSL affects systems processing sensitive data globally. By identifying these vulnerabilities through Mythos and providing the information to maintainers in a structured disclosure process with time for patches before public disclosure, Project Glasswing achieves what traditional security research struggles with: it changes the vulnerability discovery race from "who finds them first—defenders or attackers" to "defenders find them first and patch them before attackers know they exist."

Anthropic's Coordinated Vulnerability Disclosure (CVD) dashboard, updated as of May 22, 2026, documents the systematic output. The 1,596 vulnerabilities disclosed across 281 projects represent hundreds of potential attack chains that were previously unknown to defenders. The patching rate of 97 confirmed patches in 5 weeks, on open-source software where patch coordination requires convincing volunteer maintainers to prioritize fixes, demonstrates that the vulnerability identification alone has value. Maintainers take Anthropic's disclosures seriously, which likely reflects both the quality of Mythos's findings and Anthropic's established reputation in security research.

Why Anthropic Restricted Mythos

Anthropic's public rationale for restricting Mythos centered on its dangerous capabilities: the model can automatically develop functional cyberattacks at professional level. This is not hyperbole or theoretical concern. It reflects measured assessment of what Mythos can do when given specific tasks. Unlike traditional security research where expert humans identify vulnerabilities through careful analysis, Mythos can autonomously search systems for weaknesses, develop working exploits, and test them. This capability in the hands of attackers creates what Anthropic termed severe risks to global digital infrastructure.

The decision to restrict rather than release reflects organizational judgment about several factors. First, the distribution of capabilities is asymmetric. If Mythos is deployed widely, many organizations gain access, but defenders and infrastructure maintainers do not gain proportional advantage because they cannot uniformly deploy security improvements faster than attackers can exploit new vulnerabilities. Second, the volume of vulnerabilities is overwhelming. Over 10,000 critical vulnerabilities were discovered in initial Mythos testing, far exceeding what security teams can realistically patch in a given timeframe. Releasing Mythos without coordination would create a scenario where attackers gain powerful attack-finding capability while defenders lack capacity to patch all the newly discovered vulnerabilities. Third, the threat model includes actors that are state-sponsored or well-resourced criminal organizations. Restricting Mythos makes it harder for these actors to acquire the model, either through legal access or through attempted theft, and raises the operational cost of obtaining dangerous capabilities.

However, Anthropic also acknowledged that restricting Mythos indefinitely is not feasible. Other AI companies are developing models with equivalent capabilities. OpenAI, Google, and other frontier AI labs are building systems approaching Mythos-class performance. The narrow window where only one organization has access to dangerous capabilities closes quickly as multiple vendors achieve capability parity. Anthropic's position—stated explicitly by CEO Dario Amodei—is that the appropriate strategy is not to restrict capabilities but to ensure that defenders get access first and have time to harden systems before attackers acquire the capability. This shifts the framing from "should dangerous capabilities exist" to "who gets access first and how do we ensure the advantage goes to defenders."

The Staged Release Strategy: Claude Code and Claude Security

The appearance of `claude-mythos-1-preview` in public Claude Code and Claude Security references on May 25-26, 2026, followed by removal before full deployment, signals that Anthropic is implementing the next phase of staged release. Rather than making Mythos available to all Claude users immediately, the company is integrating it into two specific products: Claude Code and Claude Security. This represents a deliberate narrowing of access within the broader Claude ecosystem.

Claude Code is Anthropic's agentic developer environment where developers delegate coding tasks to Claude. Integrating Mythos into Claude Code would provide developers with access to Mythos's superior code reasoning and vulnerability-finding capabilities for their own projects. This enables developers to find and fix vulnerabilities in their codebases before shipping them to production. The advantage is localized—developers improve their own code security immediately without waiting for global patch coordination. The constraint is that Mythos access through Claude Code is tied to development workflows and remains visible within Anthropic's platforms, reducing the risk that the capability is repurposed for attack development offline.

Claude Security is Anthropic's vulnerability scanning and patch-suggestion product, currently in public beta for Enterprise customers. Powering Claude Security with Mythos would dramatically improve vulnerability detection accuracy and enable the platform to identify zero-day vulnerabilities that other tools miss. Organizations running Claude Security on their codebases would gain access to Mythos's capabilities within a security-focused workflow where findings are channeled toward remediation rather than exploitation. The constraint is that Claude Security is an enterprise product with authentication, audit logging, and organizational governance. Vulnerability findings are captured within the platform and associated with specific organizations, reducing the risk that discoveries leak to attackers.

The staged release through Claude Code and Claude Security attempts to resolve the tension between capability availability and safety constraints. Rather than making Mythos available broadly or restricting it indefinitely, Anthropic creates two deployment contexts where the capability is available but the applications are constrained toward defensive uses. Developers improve their own security. Organizations running Claude Security improve their infrastructure. Both represent Glasswing-like applications—defenders getting access first, using the capability to harden systems before attackers acquire it—but distributed through commercial products rather than restricted partnerships.

Guardrail Architecture and Risk Control

Anthropic's decision to move from Project Glasswing's restricted partnerships toward broader Claude Code and Claude Security access implies that the company has developed guardrails sufficient to mitigate the risk of misuse. The specific technical details of these guardrails remain undisclosed—Anthropic does not publish security architectures for models with dangerous capabilities—but the organizational assessment appears to be that the risk profile has changed from "should not deploy at all" to "can deploy with constraints." Understanding what this shift reflects is important for healthcare organizations making similar assessments about deploying powerful AI systems.

The guardrails likely operate at multiple levels. First, operational controls: Mythos access through Claude Code and Claude Security operates through Anthropic's platforms with authentication, rate limiting, and audit logging. Users cannot download Mythos weights and run the model offline. They access the model through APIs where usage can be monitored and unusual patterns detected. This differs from open-weight models like Llama or Mistral, which can be downloaded and deployed locally without oversight. Maintaining Mythos as a closed API model gives Anthropic visibility into usage patterns and ability to revoke access if misuse is detected.

Second, alignment and behavioral controls: Anthropic has trained Mythos with Constitutional AI principles and specific instruction tuning to refuse requests for attack development, malware creation, or other clearly harmful uses. These controls are not perfect—sufficiently sophisticated users can prompt-engineer around behavioral constraints—but they raise the friction cost of misuse and make attacks detectable. Users attempting to use Mythos to develop cyberattacks would encounter refusals and would need to expend effort to circumvent them, creating opportunities for detection through usage logging.

Third, contextual constraints: Claude Code and Claude Security are purpose-built for specific workflows. Claude Code is a developer IDE where using Mythos to develop cyberattacks is conspicuous and logged. Claude Security is a vulnerability scanning platform where discovering vulnerabilities is the intended use. The workflows constrain how the capability can be repurposed. This differs from a general-purpose chat interface where harmful uses might blend in with benign ones.

Fourth, rate and volume controls: Enterprise products can implement rate limiting and volume caps to prevent abuse. If a user or organization attempts to use Mythos to systematically discover vulnerabilities across the entire internet for exploitation, rate limiting would slow them down and create detectable patterns. A user with legitimate defensive intentions would hit these limits rarely. A user with offensive intent would encounter them consistently.

These controls are not fool-proof. Sufficiently sophisticated and resourced attackers could potentially circumvent them. However, they shift the threat model from "Mythos is freely available to anyone, including attackers" to "Mythos is available with constraints, and using it for attacks is harder and riskier than acquiring alternatives." In a world where multiple AI companies are developing Mythos-class models, making one of them slightly harder to use for attacks while enabling defenders to access it first might provide meaningful advantage.

Healthcare Implications: Software Security and Clinical AI Governance

Healthcare organizations encounter Mythos-class capability governance challenges in two contexts: hardening healthcare software infrastructure and managing powerful clinical AI systems. The Anthropic model offers lessons for both.

For healthcare IT and security teams, the lesson is that some dangerous capabilities should be deployed defensively first. Healthcare infrastructure—EHR systems, PACS, pharmacy systems, medical device firmware—contains vulnerabilities. Ransomware gangs, state-sponsored actors, and criminal organizations actively search for these vulnerabilities to deploy attacks or extract ransom. A Mythos-class capability that finds these vulnerabilities faster than attackers can discover them through their own research represents a significant defensive advantage. Healthcare organizations should be participating in programs equivalent to Project Glasswing where frontier AI capabilities are used to harden healthcare infrastructure before those capabilities become available for offensive use.

OCR and health sector regulators have not yet articulated policies on healthcare organizations using powerful AI capabilities like Mythos for security research. However, the baseline expectation is that covered entities implement reasonable security measures to protect ePHI. Using AI vulnerability research to identify and patch weaknesses in healthcare systems likely satisfies that expectation and should be documented as part of security governance. Healthcare security teams should audit their vulnerability management processes and assess whether they would benefit from advanced AI assistance in identifying weaknesses before attackers do.

The second healthcare implication is managing clinical AI governance. Healthcare organizations are increasingly deploying AI systems for diagnostic support, treatment recommendations, and clinical decision support. Some of these systems are powerful—equivalent to or approaching Mythos-class capabilities in their respective domains. The question healthcare IT and clinical governance teams face is: how do we deploy these capabilities safely? Anthropic's framework suggests several principles:

First, restrict powerful capabilities until guardrails are proven. Healthcare organizations should not deploy clinical AI systems that significantly outperform existing decision support without first validating that the system's reasoning is trustworthy and that clinicians understand its limitations. This is analogous to Anthropic's decision to hold Mythos back from public release until guardrails were ready.

Second, pair capabilities with defensive use cases. A diagnostic AI system should be deployed first in screening or detection contexts where it augments human clinicians rather than replacing them. This mirrors Project Glasswing's approach of pairing Mythos with defensive cybersecurity tasks. A treatment recommendation AI should be deployed with mandatory physician review before any treatment is administered. The workflow constrains how the capability can be misused.

Third, implement graduated access. Not all clinicians need access to the most powerful AI systems. Residents in training might use a restricted version. Attending physicians might have broader access. This graduated model allows the organization to pilot powerful systems with experienced users before generalizing. Anthropic's approach of starting with Project Glasswing partners and gradually expanding through Claude Code and Claude Security follows this principle.

Fourth, maintain audit logging and governance oversight. Healthcare organizations must log all uses of powerful clinical AI systems, understand what decisions the AI influenced, and validate that outcomes are consistent with safety expectations. This is mandatory for clinical governance but also supports the Anthropic model of maintaining visibility into usage patterns to detect misuse.

The Broader Question: Will Restricted Release Work?

The fundamental uncertainty in Anthropic's strategy is whether restricted release actually changes threat outcomes. If attackers acquire Mythos through theft, insider access, or simply waiting for competing models to mature, does it matter that Anthropic delayed public release by a few months? The question touches on deeper disagreements in AI safety about whether restricting capabilities delays harms or merely shifts them to actors with access to alternative models.

Anthropic's position is that delay matters. If defenders get access first and use that time to patch millions of vulnerabilities, then the advantage persists even if attackers later acquire the capability. If defenses are hardened before attackers have powerful attack-finding tools, the attack surface is smaller. This is the argument for Project Glasswing: the 1,596 vulnerabilities Mythos found and coordinated with maintainers are now fixed. When other Mythos-equivalent models eventually become available to potential attackers, those specific vulnerability chains are no longer available to exploit.

However, the strategy has limits. Attackers might find different vulnerabilities through other models or traditional security research. The time window for defenses to prepare is narrow if multiple vendors are developing equivalent capabilities. And coordinated vulnerability disclosure requires cooperation from open-source maintainers and commercial vendors, who have varying incentives and capabilities to patch quickly. A vulnerability identified through Mythos is only valuable if it gets patched before deployment in production systems. The responsibility for patching lies with maintainers and organizations, not with Anthropic.

Healthcare organizations can draw specific lessons: delay in deploying dangerous capabilities matters most when combined with widespread hardening of systems that could be affected. Restricting Mythos matters if the restriction period is used to patch vulnerabilities. If Anthropic delays release but vulnerabilities are not patched, the delay achieves nothing. Similarly, deploying powerful clinical AI matters most when combined with changes to clinical workflows and oversight structures that make the capabilities safer to use. If healthcare organizations deploy diagnostic AI without changing physician workflows or validation processes, the restriction strategy similarly achieves nothing.

What This Means for Healthcare CISOs

Healthcare information security leaders should view Claude Mythos and Project Glasswing as a case study in responsible powerful AI deployment with several implications for healthcare security governance. First, healthcare IT should assess whether participation in similar vulnerability discovery programs would improve healthcare infrastructure security. This might mean working with vendors to pilot advanced AI security research, ensuring that vulnerabilities discovered are patched before public disclosure, and building partnerships where frontier AI capabilities are deployed defensively first.

Second, healthcare security teams should update their risk models to assume that future AI capabilities will enable faster vulnerability discovery. The question is not whether attackers will have powerful AI tools but when, and healthcare organizations should be building resilience in infrastructure assuming that adversaries will have capabilities equivalent to Mythos. This might mean implementing additional detection controls, segmentation to limit blast radius if systems are compromised, and monitoring for indicators of advanced exploitation techniques.

Third, healthcare organizations should evaluate their clinical governance frameworks to determine whether they support responsible deployment of progressively more powerful clinical AI systems. As AI capabilities improve and more organizations deploy diagnostic, treatment, and workflow AI, healthcare governance structures must evolve to manage these capabilities safely. This includes clinical validation frameworks, audit logging and monitoring systems, clinician training on AI limitations, and governance review processes for new system deployments.

Fourth, healthcare CISOs should establish direct communication channels with Anthropic and other AI vendors about healthcare-specific security concerns. Project Glasswing demonstrates that vendors consider security partnerships with enterprises important enough to warrant restricted early access to dangerous capabilities. Healthcare organizations should be articulating their security needs and vulnerability management priorities to vendors, positioning themselves as potential Glasswing-like partners if similar programs emerge for healthcare infrastructure hardening.

The Timeline Forward

The brief appearance of `claude-mythos-1-preview` references in Claude Code and Claude Security on May 25-26, 2026, before removal, suggests a staged rollout is underway. The timing is uncertain—Anthropic may deploy gradually, test guardrails further, or delay based on additional security assessment. However, the appearance of production-ready model strings indicates that deployment code is in place and testing is occurring. Healthcare organizations should prepare for Mythos availability in Claude Code and Claude Security within weeks to months, though precise timing remains undisclosed.

When access becomes available, healthcare security teams should evaluate whether Mythos through Claude Code would improve their vulnerability research and patching processes. Organizations with security research teams and vulnerability management programs might pilot Mythos to identify weaknesses before external attackers discover them. The cost is modest—Mythos access through Claude Code would be tied to existing Claude subscriptions—and the potential benefit is significant.

For clinical AI governance, healthcare organizations should treat Mythos as a signal that AI capabilities will continue advancing rapidly. Clinical governance frameworks established today for Opus 4.7-class models must be flexible enough to accommodate Mythos-class and future models without requiring complete redesign. This suggests implementing graduated access, robust audit logging, validation workflows that can accommodate stronger models, and clinician training programs that emphasize that increasingly powerful AI systems still require human oversight and judgment.

Conclusion

Claude Mythos and Project Glasswing represent a deliberate strategy for managing dangerous AI capabilities: restrict until guardrails are proven, deploy defensively first to harden systems before attackers acquire equivalent capabilities, implement graduated access through specific products with constrained use cases, and maintain visibility and governance oversight. The strategy acknowledges that indefinite restriction is neither feasible nor optimal, but that uncontrolled release creates unacceptable risks. For healthcare organizations navigating similar decisions about powerful AI systems—both security research capabilities and clinical AI—the framework offers principles worth adopting.

The core insight is that capability governance is not binary (available or unavailable) but contextual and graduated. Dangerous capabilities can be made available defensively, with constraints, through specific workflows, with oversight, in ways that reduce but do not eliminate risks. Healthcare organizations can apply this approach to clinical AI governance, phasing in deployment, pairing with clinical workflow changes, monitoring outcomes, and maintaining visibility into how the systems are used. The goal is not to prevent powerful AI from being deployed in healthcare—the capabilities will exist regardless—but to ensure that when they are deployed, it is in ways that prioritize patient safety and clinical governance.


Key Links