Claude Code Source Code Leaked via npm: KAIROS, Fake Tool Injection, and What the Code Actually Reveals

AI Revolution Claude Code Anthropic Developer Tools

Crumpled npm package manifest with a .map file stamp under fluorescent light, representing the Claude Code source code leak

⚠️ Security Note: If you ran Claude Code via npm on March 31, 2026 between 00:21–03:29 UTC, check your lockfile for axios@1.14.1 or plain-crypto-js. A separate supply-chain attack on the axios package ran concurrently with this leak. Treat any affected machine as compromised and rotate credentials immediately.

The people calling this a "massive leak" are mostly wrong. The people saying it's nothing are also wrong. Here is what actually happened, what the code actually says, and why one specific detail buried in the technical fine print matters far more than 44 hidden feature flags or a Tamagotchi pet.

On March 31, 2026, an Anthropic engineer shipped version 2.1.88 of the @anthropic-ai/claude-code npm package with a 59.8 MB .map file included. Source maps are debug artifacts. They exist to translate minified production bundles back into readable source. They are meant to live in development environments. They are emphatically not meant to ship to npm. But this one did. Within hours, the full TypeScript codebase — approximately 512,000 lines across roughly 1,900 files — was archived to a public GitHub repository, which was forked over 41,500 times before Anthropic could respond.

Anthropic confirmed the leak to The Register: "This was a release packaging issue caused by human error, not a security breach." No model weights. No API credentials. No customer data. That part is accurate. But "not a security breach" does not mean "not a problem." The actual problem is more interesting than either camp wants to admit.

How a .map File Exposes 512,000 Lines of TypeScript

Source maps are straightforward in concept. When you compile TypeScript to minified JavaScript, you lose all the variable names, file structure, and readable logic. A .map file is essentially an index — it says "this minified token at position X corresponds to this original source line Y in this original file Z." The entire original source code is referenced inside it, in readable form.

Claude Code uses Bun as its bundler. There is an open Bun bug — filed March 11, still unresolved — reporting that Bun serves source maps in production mode even though its documentation says they should be disabled. If that bug is what caused this, then the mistake was upstream from the specific engineer who shipped the release. Either way, the result was the same: cli.js.map, all 59.8 MB of it, shipped inside the npm tarball, publicly readable by anyone who knew to look.

VentureBeat reported that this is Anthropic's second accidental exposure in a week — the previous one being the Mythos/Capybara model spec that leaked through an improperly cached public document. It is becoming a pattern of basic operational security failures at a company that is reportedly operating at a $19 billion annualized revenue run rate and is preparing for an IPO.

The irony noted by multiple people on Hacker News: Claude Code contains an entire system called Undercover Mode specifically designed to prevent internal codenames and information from appearing in public git commits. Anthropic built a leak-prevention system into Claude, then shipped the source of that system to npm in plaintext.

KAIROS: The Always-On Agent Anthropic Never Announced

The most substantive unshipped feature in the codebase is called KAIROS — a reference to the Ancient Greek concept of "the right moment." It appears over 150 times in the source, which suggests it is not a prototype. It is a nearly complete system that is simply gated behind a compile-time flag set to false in external builds.

What KAIROS actually does: it transforms Claude Code from a tool you invoke into a daemon that runs continuously. It maintains append-only daily log files recording observations, decisions, and actions. It operates a background process called autoDream — a nightly memory consolidation routine that runs when the session is idle. The autoDream logic merges observations, removes logical contradictions, and converts "vague insights into absolute facts," according to the internal comments. The result is that when you return to a session the next morning, the agent's context has been cleaned and reorganized, not wiped.

The implementation uses a forked subagent for autoDream, which is architecturally significant. It means the memory consolidation process runs in isolation so it cannot corrupt the main agent's active reasoning state. That is a non-trivial engineering decision that suggests the team has thought carefully about the failure modes of persistent memory in production agents.

KAIROS also includes agent scheduling and cron jobs — the ability to run tasks on a fixed schedule, not just on demand. The leaked code notes show a teaser rollout planned for April 1–7 (which is why some people initially assumed it was an April Fool's joke) with a full launch gated for May 2026, starting with Anthropic employees.

Fake Tool Injection: Anthropic Was Poisoning Competitor Training Data

This is the detail that got the most traction on Hacker News the day the leak broke, and for good reason. In claude.ts, there is a feature flag called ANTI_DISTILLATION_CC. When enabled, Claude Code sends a parameter called anti_distillation: ['fake_tools'] in its API requests to Anthropic's servers. The server responds by silently injecting fake, non-functional tool definitions into the system prompt.

The explicit purpose: if someone is intercepting Claude Code's API traffic to train a competing model on Claude's tool-use patterns and reasoning chains, the fake tools pollute that training data. Any model trained on this intercepted traffic would learn incorrect tool definitions and produce degraded output.

This flag is gated behind a GrowthBook feature flag called tengu_anti_distill_fake_tool_injection and is only active for first-party CLI sessions — meaning it only fires when the request comes from the official Claude Code client, not the API directly. There is also a second mechanism in betas.ts: server-side summarization that buffers assistant text between tool calls, summarizes it, and returns the summary with a cryptographic signature. This makes the raw reasoning chain harder to extract and reuse.

Whether this is defensible practice is a legitimate debate. Anthropic is protecting its intellectual property from what amounts to API scraping for training data. But the mechanism — silently injecting false information into API responses — is something that should have been disclosed to users. The tool definitions a developer sees should match the tools that actually exist.

Capybara v8 Has a 29–30% False Claims Rate. That's a Regression.

The Capybara model — the internal codename for what appears to be a Claude 4.6 variant, with Fennec mapping to Opus 4.6 and Numbat as an unreleased model still in testing — was already known from the Mythos model spec leak earlier in the week. The source code adds concrete performance numbers that were not in that leak.

Internal comments in the code note a 29–30% false claims rate in Capybara v8. The comparison point that makes this striking: Capybara v4 had a 16.7% false claims rate. The team is currently iterating on v8 of a model that is performing worse on this metric than v4. That is not a minor tuning issue — it is a documented regression on a metric that directly affects the reliability of agentic coding tasks.

The code also references an "assertiveness counterweight" added specifically to prevent the model from becoming too aggressive in refactoring operations. This suggests the team encountered a behavior where increasing the model's capability on some tasks correlated with it making overconfident, unilateral changes — and had to manually add friction to dial that back. The false claims regression and the assertiveness counterweight are almost certainly related to the same underlying training dynamic.

For competitors who were wondering where Anthropic's current performance ceiling sits on agentic tasks, this is a fairly precise data point. It also explains why Claude Code has been showing inconsistent behavior for some users over the past few weeks — they were likely on builds that included v8 variations.

Capybara Model Versions: Known Performance Data

Model Version	False Claims Rate	Status
Capybara v4	16.7%	Historical baseline
Capybara v8	29–30% ⚠️	Active regression — in testing
Numbat	Not disclosed	Unreleased — still in testing

Source: Leaked Claude Code source map comments, March 31 2026. Capybara = Claude 4.6 variant; Fennec = Opus 4.6.

Undercover Mode: What We Know and What We Don't

Undercover Mode is the feature that has generated the most speculation and the least clarity. What the code confirms: Claude Code has a mode specifically designed to make "stealth" contributions to public open-source repositories. The system suppresses internal codenames and Anthropic-identifying information from appearing in git commits and code comments when the agent is operating in this mode.

What this actually means in practice is still being investigated. The benign interpretation is that it is a privacy feature for enterprise customers who do not want their AI-assisted commits to be visibly stamped as AI-generated. The more pointed interpretation — which several researchers on HN flagged — is that Anthropic has been using Claude Code to make contributions to public open-source projects without disclosing that those contributions are AI-generated or Anthropic-originated. If true, that would be a meaningful transparency problem for the open-source projects on the receiving end.

Anthropic has not commented specifically on Undercover Mode. More detail will surface as researchers continue working through the 512,000 lines. Treat any specific claims about what this mode does — in either direction — with appropriate skepticism until there is verified evidence.

The Rest: Coordinator Mode, UltraPlan, Frustration Detection, and 187 Spinner Verbs

The Coordinator Mode is a multi-agent orchestration system activated via an environment variable (CLAUDE_CODE_COORDINATOR_MODE=1). When enabled, one Claude instance becomes a coordinator that spawns and manages multiple worker agents in parallel. The internal system prompt for the coordinator explicitly states: "Parallelism is your superpower. Workers are async. Launch independent workers concurrently whenever possible — don't serialize work that can run simultaneously." Each worker gets its own scratch pad and its own tool permissions. This is the architecture that would make complex, multi-file refactors tractable at scale.

UltraPlan is a separate 30-minute remote cloud session powered by a planning-optimized model. The idea: before executing any complex task, spin up a dedicated remote instance whose only job is to fully plan the task — checklists, dependency graphs, edge cases — and hand that structured plan to the execution agent. It adds latency up front in exchange for more reliable execution downstream. Whether the tradeoff is worth it depends entirely on the task complexity.

The frustration detection system is real and has been confirmed in the code. Claude Code monitors session language for signals that a user is becoming frustrated or losing patience. There are regex patterns in the source that flag escalating negative sentiment. What happens when those patterns fire is less clear — the code suggests it modifies how Claude processes the next prompt, possibly adjusting its tone or approach, but the exact behavior change is not fully documented in the leaked source.

There is also an ongoing API waste problem that is not hidden: internal comments from March 10, 2026 note that 1,279 sessions in a single day had 50 or more consecutive autocompact failures — in some sessions, up to 3,272 consecutive failures — wasting approximately 250,000 API calls per day globally. The fix was three lines of code: after 3 consecutive failures, disable compaction for the rest of the session. That the fix is simple makes it more notable that it was burning a quarter million API calls daily before anyone caught it.

And yes, there are 187 different spinner verbs — the rotating text Claude displays while thinking. Someone at Anthropic put meaningful time into that list. It is the most humanizing detail in the entire leak.

The Clean Room Engineering Problem Nobody Has Solved

Within hours of the leak, someone forked the source and used an AI coding tool to convert the entire TypeScript codebase to Python. The argument: if the code is in a different language, it is no longer the same intellectual property — it is a clean-room re-implementation that happens to be functionally identical. This is the digital equivalent of reverse engineering with an extra step.

Traditional clean room engineering required two separate teams — one that read the original and documented the behavior in a spec, and a second that implemented the spec without ever seeing the original code. The process took months and was expensive enough that only serious competitors bothered. AI has collapsed that process from months and large teams to hours and one person with a laptop. The legal frameworks that governed clean room engineering were written for the old speed.

As Alex Kim noted in his breakdown of the leak: "The real damage isn't the code. It's the feature flags. KAIROS, the anti-distillation mechanisms: these are product roadmap details that competitors can now see and react to. The code can be refactored. The strategic surprise can't be un-leaked." That is a precise way to put it. The features themselves will ship eventually regardless. What is gone is the timeline advantage.

My Take

The framing that annoys me most in the coverage of this leak is the implicit assumption that "leaked source code" is automatically a catastrophe. It is not. Model weights are catastrophic. Training data is catastrophic. Source code for a CLI tool is, in the hierarchy of AI IP, relatively recoverable. Anthropic can ship new versions. It can refactor. The actual code is not the moat — the moat is the model behind it, and that did not leak.

What actually matters here is the operational signal. This is the second major information leak from Anthropic in one week. The first was a model spec left in an improperly cached public document. This one is a debug artifact left in a production npm package. These are not the same type of mistake, which means this is not a single bad day for one engineer. It is a pattern of basic production hygiene failures at a company that is two or three months away from what appears to be an IPO process. Investors and enterprise customers notice patterns.

The Capybara v8 false claims regression is the technical detail I keep returning to. A 29–30% false claims rate versus a previous 16.7% in v4 is not a rounding error. It is the kind of regression that suggests the model is being pushed in a direction — more capable, more autonomous, more agentic — that is creating instability on a metric that directly affects whether developers can trust the output. The "assertiveness counterweight" they added to compensate for over-aggressive refactoring is a band-aid on a training dynamic problem. I would want to know what changed between v4 and v8 before trusting Capybara in a production pipeline.

And the fake tool injection is just worth stating plainly: Anthropic was silently modifying API responses to inject false tool definitions, specifically to corrupt any training data extracted from Claude Code sessions. That is a legitimate defensive move against API scraping. It is also something that should have been publicly disclosed, because developers have a reasonable expectation that what the API returns corresponds to what the system actually has. "We may inject fake tools into your session if we think you're scraping" belongs in documentation, not buried in a feature flag that required a source code leak to surface.

Key Takeaways

The leak was 59.8 MB of TypeScript source via an npm .map file — no model weights, no credentials, no customer data
KAIROS is a fully built always-on background agent with nightly memory consolidation — gated but nearly complete
Anthropic has been injecting fake tool definitions into API responses as a training data poisoning defense
Capybara v8 (Claude 4.6 variant) shows a documented regression: 29–30% false claims rate vs 16.7% in v4
Undercover Mode exists — its exact purpose is still being verified but involves suppressing Anthropic attribution in git contributions
AI-accelerated clean room engineering now takes hours, not months — existing IP law is not equipped for this speed
If you ran Claude Code via npm on March 31 between 00:21–03:29 UTC, check for the malicious axios versions separately from this leak

FAQ

Was Anthropic's underlying AI model exposed in this leak?

No. Model weights, training data, and the core model architecture were not part of this leak. What leaked was the Claude Code CLI — the TypeScript tooling harness that wraps the model. The actual model runs on Anthropic's servers and was never in the npm package. This distinction matters for assessing the actual damage.

What is a source map file and why was it in the npm package?

A .map file is a debug artifact that maps compiled, minified JavaScript back to the original source. It is standard practice to exclude these from production builds. Claude Code uses Bun as its bundler, and there is a known Bun bug — filed March 11, still open — where source maps are included in production builds by default unless explicitly disabled. Whether this bug caused the leak or a separate configuration error is not fully confirmed, but the mechanism is clear.

What is KAIROS and when will it ship?

KAIROS is a persistent, always-on background agent mode for Claude Code. It runs continuously, maintains daily observation logs, and performs memory consolidation overnight via a process called autoDream. The leaked code notes a teaser rollout planned April 1–7, 2026, with a full launch gated for May 2026 starting with Anthropic employees. Given the leak, Anthropic may adjust that timeline.

Is converting the leaked TypeScript source to Python legally safe?

Almost certainly not, but nobody knows for sure. The argument — that a language conversion produces new IP — mirrors traditional clean room engineering logic, which typically required strict process isolation to hold up legally. AI-assisted language conversion is fast and nearly automatic, and courts have not yet addressed whether this process satisfies the "clean room" standard. Anyone hosting converted versions of this code is taking on real legal exposure. The original uploader on GitHub already removed his version and pivoted to a Python feature port to reduce liability.

Should I stop using Claude Code after this?

The source code leak alone is not a reason to stop. The separate axios supply-chain attack that ran concurrently is a different story — if you installed or updated Claude Code via npm on March 31 between 00:21 and 03:29 UTC, check your lockfile for axios@1.14.1 or plain-crypto-js. Anthropic now recommends installing Claude Code via the native installer (curl -fsSL https://claude.ai/install.sh | bash) to avoid the npm dependency chain entirely.

If you want more context on where Claude Code sits in the current agentic tooling landscape, the analysis of what the Mythos model spec leak actually said covers the model-level details that complement what the source code revealed. And for a grounded look at where Anthropic's pricing compares to alternatives when running agents at scale, the Claude Max vs OpenClaw cost breakdown is worth reading before making infrastructure decisions based on leaked roadmap features.

The features in this leak — KAIROS, Coordinator Mode, UltraPlan — will ship eventually. What competitors can do with this roadmap in the meantime is the actual open question. Six months ago, "we saw their unshipped features" would have meant a few engineering teams adjusting their plans. Now it means those features could exist in a competing product before Anthropic finishes its own rollout. That speed asymmetry is the real story here, and it has nothing to do with Tamagotchi pets.

Claude Code Source Code Leaked via npm: KAIROS, Fake Tool Injection, and What the Code Actually Reveals

How a .map File Exposes 512,000 Lines of TypeScript

KAIROS: The Always-On Agent Anthropic Never Announced

Fake Tool Injection: Anthropic Was Poisoning Competitor Training Data

Capybara v8 Has a 29–30% False Claims Rate. That's a Regression.

Undercover Mode: What We Know and What We Don't

The Rest: Coordinator Mode, UltraPlan, Frustration Detection, and 187 Spinner Verbs

The Clean Room Engineering Problem Nobody Has Solved

My Take

Key Takeaways

FAQ

Was Anthropic's underlying AI model exposed in this leak?

What is a source map file and why was it in the npm package?

What is KAIROS and when will it ship?

Is converting the leaked TypeScript source to Python legally safe?

Should I stop using Claude Code after this?

Posted by Vinod Pandey

Post a Comment

0 Comments

Most Popular

I Used Perplexity Pro for 30 Days as My Only Research Tool — 5 Things Surprised Me

Self-Hosting Llama 4 vs GPT-4o API: The Exact Monthly Volume Where It Makes Sense (And Where It Doesn't)

The Palantir Model That Anthropic and OpenAI Are Now Copying — Forward Deployed Engineers Explained

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

Footer Menu Widget

Contact form

Claude Code Source Code Leaked via npm: KAIROS, Fake Tool Injection, and What the Code Actually Reveals

How a .map File Exposes 512,000 Lines of TypeScript

KAIROS: The Always-On Agent Anthropic Never Announced

Fake Tool Injection: Anthropic Was Poisoning Competitor Training Data

Capybara v8 Has a 29–30% False Claims Rate. That's a Regression.

Undercover Mode: What We Know and What We Don't

The Rest: Coordinator Mode, UltraPlan, Frustration Detection, and 187 Spinner Verbs

The Clean Room Engineering Problem Nobody Has Solved

My Take

Key Takeaways

FAQ

Was Anthropic's underlying AI model exposed in this leak?

What is a source map file and why was it in the npm package?

What is KAIROS and when will it ship?

Is converting the leaked TypeScript source to Python legally safe?

Should I stop using Claude Code after this?

Posted by Vinod Pandey

You may like these posts

Post a Comment

0 Comments

Most Popular

I Used Perplexity Pro for 30 Days as My Only Research Tool — 5 Things Surprised Me

Self-Hosting Llama 4 vs GPT-4o API: The Exact Monthly Volume Where It Makes Sense (And Where It Doesn't)

The Palantir Model That Anthropic and OpenAI Are Now Copying — Forward Deployed Engineers Explained

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

Footer Menu Widget

Contact form