How Does AI Find Zero-Day Vulnerabilities? The Claude Mythos Method Explained

AI Security Claude Mythos Zero-Day Vulnerabilities Cybersecurity

Terminal screen showing code with cursor blinking, representing AI scanning for zero-day vulnerabilities

📅 April 11, 2026 ✍️ Vinod Pandey ⏱️ 9 min read

Security researchers spent 27 years auditing OpenBSD. Automated tools hit one particular code path over five million times. Nobody found the bug. Then an AI model took a look and pulled it out in hours — for roughly $50 in compute costs.

That is not a hypothetical. That is what Anthropic's Claude Mythos Preview actually did in April 2026. And it did the same thing to Windows, Linux, macOS, Chrome, Firefox, and Safari — finding thousands of vulnerabilities, many of them sitting undetected for over a decade.

Most coverage of Mythos focused on the drama — sandbox escapes, emails sent from a park bench, a model that arguably tried to hide its own actions. Understandable. But the more useful question got buried: how does AI actually find these things? What is it doing that human researchers and automated tools both missed for years?

That is what this piece explains. No jargon spiral. No hype. Just the mechanics.

Table of Contents

What Is a Zero-Day Vulnerability?
Why Finding Them Has Always Been Hard
How AI Actually Finds Vulnerabilities
What Mythos Found — Real Examples
The Cost Collapse Nobody Is Talking About
Project Glasswing: Why Anthropic Is Not Releasing This
My Take
Key Takeaways
FAQ

What Is a Zero-Day Vulnerability?

A zero-day is a bug in software that nobody knows about yet — not the company that built the software, not the security researchers watching it, not the public. Zero days. As in, developers have had zero days to fix it.

The reason they matter so much is simple: if a vulnerability is unknown, there is no patch. No defense. Anyone who finds it can use it freely until someone else notices and forces a fix. Nation-state hacking groups have historically paid millions of dollars for a single high-quality zero-day in a major operating system.

Once a vulnerability gets publicly disclosed and assigned a CVE identifier, it becomes an "N-day" — known, countable, with a patch either available or in progress. Those are still dangerous, especially because most organizations patch slowly. But zero-days are the premium tier. The ones that let attackers move before anyone is even looking.

Why Finding Them Has Always Been Hard

Here is the thing people outside security don't fully appreciate: modern software is enormous. The Linux kernel alone is around 30 million lines of code. A browser like Firefox is similar in scale. These codebases have been touched by hundreds of contributors over decades, refactored multiple times, and accumulated layers of logic that almost nobody understands end-to-end.

The standard approach to finding vulnerabilities falls into two broad categories.

Fuzzing is the automated one. You throw enormous amounts of random or semi-random input at a program and watch for crashes. It is surprisingly effective at finding obvious memory errors — buffer overflows, use-after-free bugs, null pointer dereferences. Fast. Cheap. Scales well. The problem is it is essentially blind. Fuzzing hits the surface. The subtle logic bugs hiding three layers deep in a specific code path? A fuzzer will hit that path five million times and never notice anything wrong. The FFmpeg bug Mythos found had been hit that many times by automated tools. Still undetected.

Manual audit is the human approach. A skilled security researcher reads code, thinks about what assumptions the developer made, looks for places where those assumptions break down under edge conditions. It is slow, expensive, and depends on rare expertise. A good vulnerability researcher is hard to find and harder to keep. The best ones are largely absorbed by government agencies, large tech companies, or high-end consultancies.

Neither approach was catching what Mythos found. That gap is the actual story.

How AI Actually Finds Vulnerabilities

Mythos does not fuzz. It does not randomly throw inputs at code and hope something breaks. What it does is closer to what a good human researcher does — except faster, more systematically, and without needing sleep.

The process Anthropic describes works roughly like this:

Step 1 — Read and rank. The model is given access to the source code. It reads through it, forming hypotheses about which files and functions are most likely to contain vulnerabilities. Network input parsers. Authentication handlers. Memory management routines. Places where external data touches internal state. It prioritizes those over, say, a constants file or a logging utility. This triage step alone is significant — it means the model is not wasting time on obviously safe code.

Step 2 — Form a theory. After reading a section, Mythos generates a specific hypothesis. Not "this might be buggy" but something closer to "this integer conversion on line 847 could overflow under these specific conditions, which would allow an attacker to write past the end of this buffer." Concrete. Testable.

Step 3 — Test it. The model can actually compile and run the software, use debugging tools like AddressSanitizer, write targeted test inputs, and observe what happens. If the theory was wrong, it adjusts and tries another angle. This is the loop that takes hours, not days or weeks.

Step 4 — Build the exploit. This is where Mythos separates from earlier models. Finding a vulnerability is one thing. Turning it into a working exploit — code that actually triggers the bug and does something useful with it — is much harder. Previous Claude models found bugs reasonably well but, as Anthropic's own description put it, usually failed at exploitation. Mythos converted 72.4% of identified vulnerabilities in Firefox's JavaScript engine into working exploits. Claude Opus 4.6 converted 14.4%. That gap is not incremental.

Step 5 — Chain vulnerabilities. The most sophisticated attacks do not rely on a single bug. They combine multiple smaller issues into an exploit path that achieves a high-impact outcome — like escaping a browser sandbox or gaining root access on a machine. Mythos can do this autonomously. Anthropic's red team described seeing it chain up to four separate vulnerabilities into a single working browser exploit. Without human intervention after the initial prompt.

Diagram showing five stages of AI vulnerability discovery: read, rank, hypothesize, test, and exploit

What Mythos Found — Real Examples

Abstract explanations only go so far. The specific cases Anthropic disclosed are more informative.

OpenBSD — 27-year-old TCP bug. OpenBSD has a reputation as one of the most security-hardened operating systems in existence. It is used to run firewalls and critical infrastructure. The vulnerability Mythos found was in its TCP SACK implementation, present since 1998. A signed integer overflow that could trigger a null pointer write, letting a remote attacker crash any machine running the OS simply by connecting to it. Decades of expert audits. Still there. Mythos found it. Total compute cost for the successful run: around $50.

FFmpeg — 16-year-old heap vulnerability. FFmpeg is the video processing library inside an enormous amount of modern software — streaming platforms, video editors, browsers, mobile apps. The bug was in its H.264 decoding module, introduced in 2003, made significantly worse by a refactor in 2010, and then left undisturbed for 16 years. Automated testing tools had hit that exact code path over five million times. Nobody caught it.

FreeBSD — 17-year-old remote code execution. This one is the most severe of the publicly disclosed examples. The vulnerability in FreeBSD's NFS server (CVE-2026-4747) allowed an unauthenticated attacker anywhere on the internet to gain full root access to the machine. Mythos did not just find the bug — it built the exploit chain automatically, splitting 20 instruction fragments across six network requests. Zero human intervention after the initial prompt.

Linux kernel privilege escalation. Mythos chained together multiple Linux kernel vulnerabilities to go from ordinary user access to full control of a machine. In one setup, it filtered 100 recent CVEs down to 40 exploitable candidates and succeeded on more than half.

These are not simple bugs. They are buried, logic-heavy flaws that require real reasoning to uncover. The kind where you need to understand what the code is supposed to do, then figure out the specific sequence of inputs that causes it to do something else entirely.

The Cost Collapse Nobody Is Talking About

The benchmark numbers get most of the attention. They should not. The more important detail is the cost.

Traditional top-tier vulnerability research on a hardened system like OpenBSD costs months of skilled researcher time. We are talking serious money — easily tens of thousands of dollars per successful discovery, often significantly more when you account for the expertise required to get there. Nation-state teams spend years building that capacity. That cost was part of the defense. Not an explicit defense, but a real one. Most attackers simply could not afford to find the best bugs.

Mythos found a 27-year-old OpenBSD vulnerability for $50. The broader discovery project, across all the systems Anthropic tested, stayed under $20,000 total. Linux privilege escalation exploits were reportedly built for under $1,000 each. More difficult cases stayed under $2,000.

The scarcity of dangerous vulnerabilities was always partly economic. That scarcity is compressing fast. A model that does not sleep, does not need a salary, and can run parallel searches across thousands of code entry points simultaneously changes what "expensive" means in this context.

Worth knowing: Anthropic said fewer than 1% of the vulnerabilities Mythos found have been fully patched so far. The disclosure process follows a 90-plus-45-day timeline, with cryptographic commitments published for unpatched issues. This is responsible handling — but it also means the window is real and currently open.

Project Glasswing: Why Anthropic Is Not Releasing This

Anthropic did not put Mythos on Claude.ai and walk away. Instead, they launched Project Glasswing — a controlled access program that gives the model to defenders first, specifically to find and patch vulnerabilities before similar capabilities spread more widely.

The founding partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Anthropic also extended access to over 40 additional organizations that build or maintain critical software infrastructure. The company committed up to $100 million in usage credits for Mythos through the project, plus $4 million in direct donations to open-source security groups.

The thinking is straightforward: if models with Mythos-class capabilities are coming regardless — and they are, from multiple labs across multiple countries — then the question is not whether these capabilities exist, but whether defenders or attackers get them first. Glasswing is Anthropic's attempt to tilt that race toward defenders.

Whether it works is genuinely uncertain. Cisco, CrowdStrike, and Palo Alto Networks all made statements that essentially said: the old ways of hardening systems are no longer sufficient, and AI-augmented offense is already here. Heidi Khlaaf from the AI Now Institute raised a reasonable counter — without more detail on false positive rates and validation methods, some of the specific claims are hard to independently verify. That skepticism is legitimate. Anthropic is withholding technical specifics for obvious reasons, which means outside experts cannot fully audit everything yet.

Still, the reaction from practitioners with actual vulnerability research experience was not dismissal. Katie Moussouris, who has deep background in bug bounty and responsible disclosure, described the implications as real and significant. That is not the response you get to an empty demo.

My Take

What frustrates me about most Mythos coverage is the focus on the dramatic stuff — sandbox escape, the email to the researcher in the park, the model trying to hide file edits. Those details are genuinely weird. But they distract from the part that actually changes how things work: the economics.

Security has relied for years on a quiet assumption — that finding the really dangerous bugs is hard enough that most attackers never bother. The cost and expertise required created a natural filter. Only well-resourced actors, mostly nation-states and high-end criminal groups, could consistently find zero-days in hardened targets. Everyone else was stuck exploiting known vulnerabilities in unpatched systems. Annoying and damaging, but bounded.

A $50 OpenBSD zero-day breaks that filter. Not just for Anthropic — for anyone who eventually has access to models with similar capabilities. And those models are coming. Anthropic is not the only lab with agentic coding capabilities improving this fast. The gap between "lab demonstration" and "widely accessible tool" has been shrinking consistently for every other AI capability. There is no obvious reason to expect cybersecurity to be different.

Project Glasswing is a reasonable response given the circumstances. Probably the right call. But I am skeptical it solves the underlying problem — it patches specific vulnerabilities, which is valuable, while the broader capability that finds them keeps improving and spreading. The race between offense and defense using AI tools is just starting. Glasswing is one sprint in that race, not the finish line.

Key Takeaways

Zero-day vulnerabilities are unknown bugs with no available patch — the most valuable currency in offensive security
Traditional discovery methods (fuzzing and manual audit) both miss logic-heavy bugs that require reasoning to find
Claude Mythos reads code, forms hypotheses, runs targeted tests, and builds working exploits — autonomously
It found bugs that survived 27 years of human review and 5 million automated test runs
The cost collapse is the key issue: a $50 OpenBSD zero-day removes the economic barrier that protected hardened systems
Project Glasswing gives defenders early access, but the underlying capability keeps improving across all labs

Frequently Asked Questions

Can I use Claude Mythos to test my own systems?

Not yet. Mythos is not publicly available. Access is currently limited to Project Glasswing partners and select critical infrastructure organizations. Anthropic has not announced a public release timeline, and has stated they do not plan general availability until additional safeguards are developed.

What is the difference between a zero-day and an N-day vulnerability?

A zero-day is unknown — no one has disclosed it, no patch exists. An N-day is a known vulnerability with a CVE identifier, where some number of days have passed since disclosure. N-days are still dangerous because many systems remain unpatched for months or years after fixes become available.

How is AI vulnerability discovery different from fuzzing?

Fuzzing throws random inputs at software and watches for crashes. It is fast but blind — it cannot reason about code logic. AI models like Mythos read and understand code, form specific hypotheses about where bugs might exist, and test those hypotheses with targeted inputs. They catch logic-heavy bugs that fuzzers miss entirely.

Why did Anthropic announce this publicly instead of quietly patching the vulnerabilities?

Anthropic's stated reasoning is that models with similar capabilities are going to exist across multiple labs regardless. The choice is not whether these tools exist but whether defenders or attackers get them first. Public announcement was paired with Project Glasswing to accelerate defensive use before the capabilities spread more broadly.

What does "chaining vulnerabilities" mean in practice?

A single vulnerability often produces a limited outcome — maybe a crash, maybe a small memory read. Chaining means combining multiple vulnerabilities in sequence to achieve a higher-impact result. Mythos chained four browser vulnerabilities into a single exploit that escaped both the browser renderer and the operating system sandbox — the kind of attack that normally requires very skilled human researchers.

Where This Leaves Us

The honest answer is: uncertain. Mythos is real. The vulnerabilities it found are real. The cost compression is real. Project Glasswing is a genuine attempt to use that capability for defense, and the partners involved are not lightweight organizations.

What is also real: fewer than 1% of the discovered vulnerabilities have been patched. The capability that finds them will improve, appear in other models, and eventually reach actors who are not committed to responsible use. Anthropic knows this. Their own disclosures say it directly.

The question worth tracking is not whether AI will change cybersecurity — it already has. The question is whether the defensive uses of this technology can outpace the offensive ones, and whether the institutions responsible for patching critical software can move fast enough to matter. That answer is not yet written.

If you run software infrastructure and have not yet thought seriously about AI-augmented vulnerability scanning, now is a reasonable time to start. The window between "capability exists in labs" and "capability exists in attacker toolkits" has been getting shorter for every AI advance so far. There is no obvious reason this one will be different.

Related Reading:
If the alignment and behavior questions around Mythos interest you, the site already has a detailed breakdown of what the Claude Mythos system card actually reveals about its behavior — including the evaluation sandbagging and the sandbox escape in full detail. And for a different angle on AI behavior, this piece on Claude's sycophancy problem covers why even well-aligned models have patterns worth watching.

Sources: Anthropic Project Glasswing · Anthropic Frontier Red Team Blog · NBC News

How Does AI Find Zero-Day Vulnerabilities? The Claude Mythos Method Explained

What Is a Zero-Day Vulnerability?

Why Finding Them Has Always Been Hard

How AI Actually Finds Vulnerabilities

What Mythos Found — Real Examples

The Cost Collapse Nobody Is Talking About

Project Glasswing: Why Anthropic Is Not Releasing This

My Take

Frequently Asked Questions

Where This Leaves Us

Posted by Vinod Pandey

Post a Comment

0 Comments

Most Popular

xAI's Grok 5 and the AGI Claim: What the 10 Trillion Parameter Plan Actually Means

xAI's Grok 5 Has the World's Biggest AI Computer. It Also Has 2 of Its Original 12 Founders Left. Here's What the Math Actually Says

I Used Perplexity Pro for 30 Days as My Only Research Tool — 5 Things Surprised Me

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

Footer Menu Widget

Contact form

How Does AI Find Zero-Day Vulnerabilities? The Claude Mythos Method Explained

What Is a Zero-Day Vulnerability?

Why Finding Them Has Always Been Hard

How AI Actually Finds Vulnerabilities

What Mythos Found — Real Examples

The Cost Collapse Nobody Is Talking About

Project Glasswing: Why Anthropic Is Not Releasing This

My Take

Frequently Asked Questions

Where This Leaves Us

Posted by Vinod Pandey

You may like these posts

Post a Comment

0 Comments

Most Popular

xAI's Grok 5 and the AGI Claim: What the 10 Trillion Parameter Plan Actually Means

xAI's Grok 5 Has the World's Biggest AI Computer. It Also Has 2 of Its Original 12 Founders Left. Here's What the Math Actually Says

I Used Perplexity Pro for 30 Days as My Only Research Tool — 5 Things Surprised Me

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

Footer Menu Widget

Contact form