Clean room engineering is supposed to be expensive. That expense is, in a very practical sense, the mechanism by which software copyright works. You can't copy code — but you can rewrite it from scratch. The catch was always that doing so required months of work, two isolated teams, lawyers verifying separation protocols, and a budget most companies can't justify. That cost was the moat.
On March 31, 2026, that moat was crossed in two hours by a single developer who was, for a portion of it, asleep.
What the claw-code incident actually revealed has very little to do with Anthropic's source leak, or even with Claude Code specifically. The real story is structural: AI coding agents have eliminated the cost barrier that made clean room engineering a theoretical right rather than a practical weapon. And IP law, written for a world where that barrier existed, has not caught up. (If you want the full breakdown of what was inside the leaked code — KAIROS, fake tool injection, Capybara v8's false claims regression — that analysis is in our Claude Code source leak deep-dive. This article is about the legal and strategic layer on top of it.)
What Clean Room Engineering Actually Means
The term sounds more technical than it is. Clean room engineering is a legal process, not a technical one. Its purpose is to produce a functional replica of software without infringing the copyright of the original.
Copyright law protects specific code — the expression — not the behavior or architecture of what that code does. This means that if two separate developers independently write functions that do the same thing, neither infringes on the other. The idea of what software does is not ownable. Only the specific lines written to implement that idea are.
The clean room process operationalizes this principle with a strict separation structure. Team One — the "dirty team" — reads and analyzes the original software, producing a detailed functional specification. Team Two — the "clean team" — receives only that specification. They have never seen the original code. Working purely from the spec, they build new software that replicates every behavior described. Since Team Two never touched the original codebase, their output cannot be a copy. It is legally original, regardless of how functionally identical it is to what they were told to replicate.
The most cited real-world example is Compaq in 1982. IBM had the dominant PC BIOS, and Compaq needed a compatible version without copying IBM's proprietary code. Their engineers executed exactly this two-team process, and the resulting BIOS allowed Compaq to build IBM-compatible PCs entirely legally. Courts later validated this approach, cementing clean room engineering as an accepted method for navigating software copyright.
Verdict so far: The legal theory is solid and court-tested. The thing that limited its use was always cost.
The Cost That Made It Theoretical
Clean room engineering on a complex codebase is a major undertaking. You need two fully separate developer teams — not just two individuals, but teams who have no contact with each other and can credibly demonstrate that separation if challenged in court. You need lawyers reviewing the process at each step to ensure no contamination occurs. You need time: the analysis phase, the specification writing phase, and then the implementation phase run sequentially to prevent cross-contamination. For anything beyond a small library, this is a multi-month project measured in engineer-years and legal fees.
This cost structure created a de facto protection that copyright law itself doesn't explicitly provide. In theory, any software could be clean-room cloned. In practice, the effort required made it economically viable only in situations where the market opportunity was enormous — think PC compatibility in the 1980s, or the Phoenix Technologies BIOS clone that enabled the entire IBM-compatible PC industry.
For most commercial software, the math never worked. Spending six months and significant capital to clone a competitor's product, when you could build your own from scratch or simply license the original, made little sense. The moat wasn't legal. It was economic. And it was stable for about forty years.
What Happened on March 31, 2026
The incident began with a mundane build error. Anthropic's Claude Code uses Bun as its bundler. Bun has a known bug — documented in a filed GitHub issue from March 11, 2026 — where source maps appear in production builds even when they shouldn't. When Claude Code v2.1.88 shipped to the npm public registry, it included a 59.8 MB JavaScript source map file that mapped the minified production code back to fully readable, commented TypeScript. Nobody had to hack anything. The file was just there.
Security researcher Chaofan Shou noticed at 4:23 AM UTC and posted a public link. Within hours, the repository had been forked over 41,000 times. What was exposed: approximately 512,000 lines of TypeScript across 1,906 source files — the query engine, tool system, multi-agent orchestration logic, context compaction, and 44 feature flags covering functionality that hasn't yet shipped publicly. Our technical breakdown of the leaked code covers each of these in detail — KAIROS, the anti-distillation mechanism, Capybara v8's documented regression. This article focuses on what the rewrite that followed means legally.
Anthropic began issuing DMCA takedown notices against repositories hosting the original TypeScript. Standard legal response. Then they overreached — taking down legitimate forks of their own open-source repositories before publicly retracting those notices. The internet did not wait for any of this to resolve.
The 2-Hour Rewrite: What It Actually Took
Sigrid Jin — a developer previously profiled by the Wall Street Journal as one of the world's most active Claude Code users, having processed over 25 billion Claude Code tokens in the past year — woke up at 4 AM to see the DMCA wave rolling across GitHub. Rather than fork the leaked TypeScript, Jin did something different: he executed a clean room rewrite of Claude Code's entire architecture from scratch.
The tool Jin used was oh-my-codex, a workflow layer he built on top of OpenAI's open-source Codex. The process, as he later documented, wasn't Jin sitting at a terminal manually writing code. He opened Discord on his phone, typed an instruction, and put the phone down. The agents read the message, broke the work into tasks, assigned roles between themselves, wrote code, tested it, argued over failures, fixed what didn't pass, and pushed. Jin went to make coffee. The agents kept working.
The resulting project — claw-code — is a Python-and-Rust rewrite of Claude Code's agent harness architecture. It reimplements the core modules: the CLI entry point, query engine, runtime session management, tool execution layer, permission context system, and multi-agent orchestration. The repository includes a parity_audit.py file that explicitly tracks what the rewrite hasn't caught up to yet — a transparency choice that's more honest than pretending completeness.
Within two hours of publication, claw-code reached 50,000 GitHub stars — a record for the platform. Within days, it surpassed 72,000 stars and 72,600 forks. Claw-code is also notable for being model-agnostic: unlike the original Claude Code, which is locked to Anthropic's models, claw-code accepts any model via its configuration. The architectural moat and the model lock-in were separated in two hours.
Importantly, this is not a copy of the leaked TypeScript. It is a clean room rewrite — functionally equivalent, architecturally similar, but not derived from Anthropic's code. Under the same legal framework that validated Compaq's IBM-compatible BIOS in 1982, Anthropic has no copyright claim over it. The DMCA cannot reach it.
Verdict: What used to require a legal team, two separated developer groups, and months of work was executed by one person with an agent coordination system in the time it takes to watch a movie.
IP Law Written for 1982
The legal framework for software copyright was largely shaped by cases and statutes from the 1980s and 1990s. At that time, clean room engineering was expensive enough that it only made sense for massive commercial stakes — IBM BIOS compatibility, or Oracle's Java API (the subject of a decade-long legal battle that only concluded in 2021). The assumption baked into the legal structure wasn't written down anywhere, but it was real: the cost of clean room replication was high enough to function as a practical barrier, even if the legal barrier was not absolute.
That assumption is now wrong. And the law hasn't registered this yet.
As Plagiarism Today analyzed in late March 2026 — before the Claude Code incident made this concrete — AI-assisted clean room rewriting creates a category of questions that existing copyright doctrine simply wasn't built to answer. Can an AI serve as the "clean team"? If the AI was trained on the original codebase (and most large language models have been trained on most publicly available open-source code), does that constitute contamination? Does the human directing the AI count as "having seen the original" even if the human never read a single line?
There is another wrinkle specific to this case. Anthropic's CEO has previously indicated that significant portions of Claude Code were written by Claude. The DC Circuit upheld in March 2025 that AI-generated work does not automatically carry copyright protection — it is, in effect, public domain. If substantial parts of Claude Code's codebase were AI-generated and therefore not copyrightable, Anthropic's legal position over the original leak weakens considerably, and the argument against the clean room rewrite weakens further still. This connects directly to the Capybara v8 findings in the source code leak analysis — a codebase co-written by the model it's wrapping occupies genuinely novel legal territory.
Verdict: The legal framework is operating on assumptions that expired sometime in 2025. Nobody has updated it.
The Legal Questions Nobody Has Answered Yet
The claw-code situation has surfaced several genuinely open legal questions that courts have not addressed. They are worth stating plainly, because they affect every software company, not just Anthropic.
Does an AI-assisted clean room rewrite count as clean? Traditional clean room methodology requires human teams with demonstrable separation. When an AI model serves as the "clean team," the model's training data — which almost certainly includes the software being replicated — complicates the picture. Is training-data exposure the same as the human "dirty team" contaminating the "clean team"? No court has ruled on this.
What happens when the original software was partly AI-generated? If the copyrightable portion of a codebase is smaller than commonly assumed, the entire DMCA enforcement strategy over leaked AI-generated software becomes legally murky. This is not a hypothetical — it is a live question for Anthropic right now.
Can DMCA reach decentralized storage? The original leaked TypeScript was also uploaded to IPFS — the InterPlanetary File System — with all telemetry stripped and experimental features unlocked. Whether DMCA takedown requests can legally reach content stored on a distributed network like IPFS is entirely unresolved. Platform-based DMCA processes assume a central host to contact. IPFS has no such host.
Does speed change the legal analysis? Courts have historically been more skeptical of clean room claims when the rewrite was suspiciously fast and clearly inspired by a specific incident. Two hours, in direct response to a source code leak, is an unusual fact pattern for a clean room defense. Whether that matters legally is untested.
None of these questions have answers yet. The claw-code repository is still live. Anthropic has not announced litigation. The legal battle — if there is one — hasn't started.
What This Means for Software Companies
If the claw-code legal position holds — and the early consensus from IP-focused observers like Gergely Orosz is that it likely does — then a specific kind of competitive moat has permanently changed.
Previously, a company could be reasonably confident that its proprietary codebase was protected by a combination of copyright law and the economic cost of clean room replication. The legal protection was real. The economic cost made it largely moot to test. Together, they functioned as a meaningful barrier.
That calculus has changed. The economic cost is now near zero for a sufficiently capable agent coordination system. What remains is only the legal question, and that question is currently unsettled on multiple fronts — AI training contamination, AI-generated code's copyright status, decentralized storage reach, and the speed-of-replication issue.
The companies most affected are those whose competitive advantage was primarily architectural — whose moat was the harness, the orchestration logic, the specific way they connected tools and agents — rather than the model weights themselves. Weights are harder to replicate in two hours. Architecture, apparently, is not. For users thinking about this from a cost angle, the Claude Max vs OpenClaw cost breakdown is a useful frame for how quickly the model-agnostic harness layer changes the economics of running AI coding agents at scale.
My Take
The coverage of this incident has been almost entirely about the drama: leaked code, DMCA wars, irony stacked on irony, a Tamagotchi easter egg that inadvertently exposed 512,000 lines of proprietary TypeScript. All of that is real and interesting. None of it is the actual story.
The actual story is that a cost barrier collapsed, and the legal structure that quietly depended on that barrier hasn't noticed yet. Clean room engineering was never cheap because someone chose to make it expensive. It was expensive because it required human labor in sequence — analyze, specify, build — with verifiable separation at each step. AI agent coordination systems don't work sequentially. They parallelize. They don't need to be separated from themselves. And they are not being billed by the hour in any way that maps onto the traditional economics of this process.
What concerns me isn't claw-code specifically. It's the general pattern. If a developer with a Discord channel and an agent coordination system can clean-room clone a 512,000-line proprietary codebase in two hours, then that same capability exists for any codebase. The incident at Anthropic happened to be visible because of the source map leak. The underlying capability would have existed regardless. Anyone with access to the right agent tooling and sufficient knowledge of what a target system does — from documentation, from user reports, from API behavior — could theoretically run this process without a leak ever happening.
I don't think the legal system will resolve these questions cleanly or quickly. Copyright law moves slowly. The AI copyright status question is already years behind the technology. My expectation is that software companies will spend the next several years operating with genuine uncertainty about what their code is actually worth as a legal asset — and that the ones who treat architecture as a moat without also competing on model quality, distribution, or ecosystem lock-in will find that moat draining faster than they planned for.
Key Takeaways
- Clean room engineering's cost barrier — the practical moat that protected most software IP — has been effectively eliminated by AI agent coordination systems.
- The legal framework for clean room validity was written for human teams over months. Courts have not yet addressed AI-assisted clean room rewrites.
- If significant portions of the original codebase were AI-generated, copyright protection over it may be weaker than assumed under current US law.
- DMCA enforcement has no tested mechanism for reaching content stored on decentralized networks like IPFS.
- The competitive moat most at risk is architectural harness code — not model weights, which remain harder to replicate.
- Software companies should expect several years of legal ambiguity before courts establish doctrine on these questions.
FAQ
What is clean room engineering, and why does it matter legally?
Clean room engineering is a process where one team analyzes existing software and creates a functional specification, then a completely separate team builds new software from that specification alone — without ever seeing the original code. The result is functionally identical but legally original, because copyright protects specific code expression, not what software does. Courts validated this approach in cases like Compaq's IBM BIOS clone in the 1980s. It matters because it determines how much protection proprietary software actually has in practice.
Is claw-code actually legal?
Under the traditional legal framework for clean room engineering, it likely is — with some unresolved caveats. The project doesn't copy Anthropic's TypeScript; it reimplements the same architecture in Python and Rust. Copyright protects code expression, not architecture. However, whether an AI-assisted rewrite fully satisfies the "clean team" separation requirement is an open legal question that courts have not yet addressed. As of April 2026, Anthropic has not announced legal action against claw-code specifically.
Why couldn't Anthropic simply DMCA claw-code?
DMCA takedowns target copyright infringement — reproducing someone else's specific code. Claw-code is a rewrite, not a copy. Anthropic could only prevail in a DMCA claim if it could demonstrate that claw-code contains their original expression, which it is designed specifically not to do. Additionally, Anthropic already overreached in its initial DMCA wave, taking down legitimate forks of its own open-source repositories before retracting those notices — which weakens the perception of a strong legal position.
What does "AI-generated code may not be copyrightable" mean for Anthropic?
The US Copyright Office and the DC Circuit have established that copyright requires human authorship. AI-generated content, including code, does not automatically qualify for protection. If a significant portion of Claude Code was written by Claude — which Anthropic has implied — then those portions may be legally unprotectable regardless of the leak. This doesn't mean the entire codebase lacks copyright; human-authored portions are still protected. But it complicates any enforcement strategy significantly.
Does this mean any software can be cloned in two hours now?
Not exactly, but the cost barrier is lower than it has ever been. The claw-code rewrite was assisted by having the leaked source available for architectural reference. In cases where source code is not leaked, a clean room rewrite requires understanding the target software's behavior from its public interface and documentation — harder, but increasingly feasible with capable agents and sufficient domain knowledge. The two-hour figure is specific to this incident. The general direction — toward dramatically lower cost for clean room replication — is not.
What should software companies do differently after this?
Build supply chains as tightly as the IP itself. The Claude Code leak happened because of a known build tooling bug — a missing entry in .npmignore combined with a Bun bundler issue — not a security breach. Build pipeline hygiene, source map exclusion verification, and output auditing before every release are now first-order concerns. Beyond that, companies whose competitive advantage sits in architecture and harness logic specifically need to reckon with the fact that that layer now has a lower practical protection threshold than it did a year ago.
The honest caveat is that nobody knows how this plays out legally. It is entirely possible that courts, when eventually faced with AI-assisted clean room cases, find reasons to restrict the doctrine's application — through training data contamination arguments, through the speed-of-replication factor, or through requirements that future AI clean room processes maintain demonstrable separation between analysis and implementation phases.
What is not uncertain is the technical direction. Agent coordination systems will get faster and more capable. The cost of replicating software behavior will continue to fall. Whether the law catches up to this reality — and in which direction — will shape how software IP works for the next decade. Any software company that is certain about its legal position right now should probably read more carefully.
0 Comments