
NSA Test of Anthropic AI ‘Mythos’ Exposes Deep U.S. Classified Network Vulnerabilities
In a controlled red‑team exercise, NSA chief Gen. Joshua Rudd says Anthropic’s AI model Mythos gained access to nearly all classified systems within hours, raising urgent questions about digital defenses built for human adversaries. The result warns governments, contractors and allies that next‑generation AI isn’t just a tool to defend networks — it can also supercharge the attack surface.
A U.S. National Security Agency red‑team exercise using Anthropic’s advanced AI model Mythos found that the system was able to penetrate almost all of the agency’s classified networks in a matter of hours, according to NSA Director Gen. Joshua Rudd. The authorized test, described publicly on 21 June, is one of the starkest official acknowledgments yet that the architecture of America’s most sensitive digital defenses remains calibrated to human attackers, not autonomous systems that can probe and adapt at machine speed.
Rudd said the red‑team drill allowed Mythos to operate as an attacker under carefully controlled conditions. Within hours, he said, the AI model had effectively gained access to nearly all classified systems the NSA exposed to it. No operational details, methods or specific systems were named, but the broad outcome — a near‑total compromise in a tightly supervised environment — suggests that even elite intelligence networks can be rapidly mapped and exploited by sufficiently capable AI tools once they are given a foothold.
For intelligence professionals, the finding cuts in two directions. On one hand, the NSA now has a proof‑of‑concept that AI can be used as an internal stress test capable of relentlessly hammering at its own defenses in ways no human team could match. On the other, the results confirm that if similar capabilities are developed by hostile states, criminal syndicates or rogue insiders, long‑standing assumptions about how long it takes to breach a hardened network — and how quickly defenders can respond — may no longer apply.
The operational stakes radiate beyond Fort Meade. Classified systems at the NSA do not exist in isolation; they are threaded into joint communications with the Pentagon, other intelligence agencies, and in some cases allied services. If a model like Mythos can escalate privileges, pivot between enclaves and stitch together misconfigurations faster than defenders can detect anomalies, then the risk extends to any partner network that touches U.S. systems. For defense contractors and critical infrastructure operators who rely on government‑issued security certifications, the implication is blunt: a seal of approval designed for human‑scale threats might not be enough against AI‑driven intrusions.
Technically, the red‑team’s success hints at how AI can compress the reconnaissance, exploitation and lateral‑movement phases of an attack. Instead of a team of human operators dividing tasks, a single model can simultaneously scan for vulnerabilities, craft exploit chains, and autonomously refine its tactics based on system responses. That could turn what used to be a months‑long campaign into an operation measured in hours or days, even against targets that have spent years hardening their perimeters and segmenting their networks.
Strategically, the episode forces a reframing of AI’s role in cyber conflict. Much of the public debate has cast advanced models as potential “copilots” for defenders, automating log analysis or intrusion detection. The NSA test underscores that the same properties — pattern recognition at scale, code synthesis, relentless iteration — make AI equally potent as an offensive weapon. The question for national security planners is no longer whether AI will be used in cyber operations, but which side will use it better and first.
For lawmakers and oversight bodies, Rudd’s admission raises oversight and classification questions. If nearly all systems exposed to Mythos proved vulnerable under test conditions, how will agencies prioritize remediation in environments where some legacy systems cannot be easily patched or replaced? How should risk be communicated to Congress and allies without disclosing technical details that could guide real attackers? And how does the U.S. regulate the proliferation of models powerful enough to replicate what Mythos achieved inside NSA walls?
The lesson that will stick with many in the defense and tech worlds is simple: the assumption that “air‑gapped” or classified networks are inherently safe from rapid, AI‑driven compromise is now much harder to sustain. Network diagrams that once looked secure on paper may, under machine scrutiny, reveal hidden pathways and weak links that humans struggle to see.
The key signals to watch next are how quickly the NSA and other agencies move from red‑team findings to concrete changes — such as new access‑control architectures, AI‑based defensive tools of their own, procurement rules for secure systems, and potentially export controls or usage guidelines for high‑end AI models. Whether U.S. allies acknowledge similar tests, or whether adversarial states hint at their own AI‑driven cyber programs, will show how fast the rest of the world is absorbing the same uncomfortable lesson.
Sources
- OSINT