OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

OpenAI AI Revolution AGI Watch GPT-6

A potato resting on a server rack in a blue-lit data center, symbolizing OpenAI's "Spud" model codename

Pre-training
Completed March 24, 2026

Release Window
Weeks away (April 2026 est.)

Sora Status
Discontinued — compute redirected

Tao's Verdict (2023)
AI will be a "trustworthy co-author" by 2026

"Accelerate the economy." That is the phrase Sam Altman used internally to describe what OpenAI's next model — codenamed Spud — can do. CEOs say things like that all the time. But two events happened within the same week that make Altman's claim worth taking seriously rather than dismissing as the usual Silicon Valley hype cycle.

First: OpenAI confirmed Spud has completed pre-training as of March 24, 2026. Second: Terence Tao — widely regarded as the greatest living mathematician — published a paper on arXiv in which one of the key proofs was carried by ChatGPT Pro. Not assisted. Carried. He split the problem in two, solved one half himself, and handed the other to an AI model. That half came back proven.

These are not the same story. But they connect. And the connection is worth examining carefully — without the breathlessness that normally surrounds OpenAI announcements.

What exactly is Spud — and why does the name matter less than the timing?

Spud is OpenAI's internal codename for its next major model, which completed pre-training on approximately March 24, 2026, according to a report by The Information. The name is a placeholder — the same way Google's Android releases were desserts before they weren't. Whether this ends up being called GPT-5.5 or GPT-6 is currently unknown. OpenAI has not confirmed either naming convention.

What is confirmed: Altman described it internally as a "very strong model" that could "really accelerate the economy." Those are not phrases he typically deploys for incremental updates. When OpenAI ships a minor model improvement, the language is usually cautious and technical. "Accelerate the economy" sits in a different register entirely. It is the language of systems, not products — of macro impact, not feature lists.

According to the lifearchitect.ai GPT-6 tracker, which monitors official statements and credible third-party reports, Spud is being trained at the Stargate facility in Abilene, Texas — the first Stargate site — using over 100,000 H100 GPUs. The parameter count has not been disclosed. Neither has whether it is a reasoning model, non-reasoning, or something with a new architecture entirely. OpenAI employees have hinted that Spud contains a capability that is "very different from what we've seen before," though no specifics have been shared publicly.

A public release is expected within weeks — possibly late March or April 2026. Given OpenAI's recent release cadence (five GPT-5 variants in under seven months), there is no particular reason to expect a long delay between completed pre-training and deployment.

Why did OpenAI kill Sora to build Spud?

This is the most structurally interesting decision in the entire story. Sora, OpenAI's video generation model, required a shocking amount of compute — reportedly surprising even senior employees when the numbers were surfaced internally. The company made a straightforward calculation: that compute was more valuable powering Spud than generating video.

The consequences were significant. OpenAI had reportedly been in discussions with Disney over a licensing deal involving Sora — reportedly structured around a $1 billion commitment for AI-powered production. That deal never officially closed, but it was far enough along that Sora's cancellation came as news to Disney, which issued a vague public statement acknowledging OpenAI's right to pursue its own direction. In plain terms: they were blindsided.

Video generation is now off OpenAI's product roadmap entirely. The Sora research team, led by Bill Peebles, has been redirected toward what Peebles described as "systems that deeply understand the world by learning to simulate arbitrary environments at high fidelity." That is the language of world models for robotics — the same direction Google DeepMind and Nvidia have been publicly pursuing. Altman confirmed in internal communications that the Sora team would prioritize long-term world simulation research, particularly as it applies to robotics.

The decision signals that OpenAI has concluded that creative media generation is a lower-priority battleground than enterprise productivity and physical economy automation. Whether that assessment holds once competitors ship competitive video tools is another question.

Two server chip arrays representing compute being redirected from Sora video generation to OpenAI's Spud model

What is the "super app" Spud is supposed to power?

OpenAI is building what internally gets described as a super app — a single platform combining ChatGPT, Codex (the coding agent), Atlas (the web browsing tool), and other existing products into one unified interface. The model powering all of it is Spud.

This is not a small integration project. Each of these tools currently operates with its own interface, infrastructure, and in some cases its own model. Combining them under a single Spud-powered experience would mean that a user could move from conversational query to code generation to live web research to agentic task execution within one session, one interface, one model context. That is materially different from the current fragmented experience.

It is also a direct competitive response. Anthropic already has Claude Code and Cowork — which allows users to control AI agents from a phone. OpenAI's super app ambition appears to be a recognition that the frontier is moving from model benchmarks to integrated workflows. The company has also acquired the creator of OpenClaw (an open-source Claude framework), presumably to accelerate the agentic side of the super app buildout.

Fiji Simo, a senior OpenAI executive, has renamed her division's position in the organizational chart to "AGI Deployment." This is the first time "AGI" has appeared anywhere in OpenAI's formal org chart or as a product category. That renaming is not incidental to the super app story — it is the framing for it.

What does the safety team restructure actually signal?

Altman has stepped back from direct oversight of OpenAI's safety and security teams. The safety team now reports to Mark Chen. The security team moves under Greg Brockman within scaling operations. Altman's stated reason is infrastructure — he needs to focus on chips, data centers, and supply chains at unprecedented scale.

Read charitably: this is a logical division of responsibilities as the company scales. Safety is a technical function that benefits from dedicated executive ownership closer to the research teams. Altman freeing himself from day-to-day safety oversight to focus on the physical infrastructure supporting Stargate is a reasonable operational choice.

Read skeptically: this is the CEO of an AI lab that has publicly staked its identity on safety, removing himself from safety oversight at precisely the moment the company is about to deploy what it is hinting could be near-AGI-level capability. The timing is notable. Whether the new reporting structure maintains equivalent rigor is something that only becomes clear over time, not from an org chart announcement.

Both readings can be simultaneously true. Organizations restructure for multiple reasons at once. The question worth watching: what does OpenAI's external safety communications look like in the months following Spud's release?

What did Terence Tao actually let ChatGPT prove?

On March 23, 2026, Tao posted a paper to arXiv titled "Local Bernstein theory, and lower bounds for Lebesgue constants." It addresses a problem originally posed by Paul Erdős about Lagrange interpolation — a classical but unresolved area of approximation theory. As reported by Aihola, buried in the paper's exposition is a specific acknowledgment: one of the key inequalities was proved by ChatGPT Pro.

In his accompanying blog post, Tao described the process in detail. He had reduced the main problem to a toy inequality involving trigonometric polynomials. Google's AlphaEvolve confirmed numerically that the inequality appeared likely to be true — that sinusoids were likely the extremizer — but could not produce a rigorous proof. Tao couldn't find one either. So he fed the problem to ChatGPT Pro.

The model identified it as an approximation theory problem and returned a duality-based proof built on the Fourier expansion of the square wave. Tao then adapted that proof to handle functions of global exponential type — contributing roughly half of what was needed to establish the main inequality. The other half, a matching lower bound, required different methods, which Tao handled himself.

This is the correct way to evaluate what happened. ChatGPT did not solve a Fields Medal-level problem. It recognized a pattern in a well-scoped sub-problem, applied a known technique (duality via Fourier analysis), and returned something Tao could verify and build on. That is precisely what a capable research assistant does. The Fields Medal-level contribution was in framing the problem, decomposing it correctly, and handling the half that required genuine novelty.

What makes this significant is not the specific proof. It is that Tao's prediction from 2023 — that by 2026, AI would be a "trustworthy co-author" in mathematical research — appears to be arriving ahead of schedule. He made that prediction while describing then-current models as performing at the level of "a mediocre, but not completely incompetent, graduate student." The delta between that 2024 assessment and publishing a paper where the AI contributes a core proof is substantial, and it happened in under two years.

How does AlphaEvolve fit into all of this?

AlphaEvolve is Google DeepMind's evolutionary coding and research agent, built on top of Gemini. It operates by generating many candidate solutions to a problem, evaluating them against a fitness function, and iteratively refining the ones that perform best — essentially mimicking biological evolution in a software loop. Tao and Google DeepMind researcher Javier Gómez-Serrano have been working with AlphaEvolve since at least May 2025, when they co-authored a paper on mathematical exploration using the tool.

In the Bernstein paper, AlphaEvolve served a different function than ChatGPT. It acted as a numerical oracle — rapidly testing whether an inequality was likely true before a rigorous proof existed. This is valuable because it tells a mathematician which direction is worth pursuing before investing time in proof construction. AlphaEvolve confirmed the inequality, which then gave Tao the confidence to seek a proof. ChatGPT found the proof.

This division of labor — numerical exploration via AlphaEvolve, proof generation via LLMs, formal verification via tools like Lean, and human direction throughout — is the emerging architecture of AI-assisted mathematics. None of these tools replaces the others. None replaces the mathematician. But as documented by Technology.org, GPT-5.2 alone has already made meaningful autonomous progress on at least eight Erdős problems that had resisted human attention for decades.

Tao's own assessment, shared on Mastodon, is characteristically careful: AI systems are well-suited for the "long tail" of obscure problems where standard techniques apply but no one had bothered to try them systematically. The hard problems — the ones requiring new mathematical structures and concepts — still belong to humans. For now. The phrase "for now" is doing significant work in that sentence.

Is OpenAI about to call something AGI?

The organizational evidence points in an interesting direction. Fiji Simo's division is now officially called "AGI Deployment." Altman's internal language around Spud — "accelerate the economy" — is the kind of framing associated with systemic, not incremental, capability. The ARC-AGI-3 benchmark was launched this week, with frontier models currently scoring 0.37% against humans at 100% — a benchmark specifically designed to remain difficult for AI. OpenAI would not launch a benchmark it expects to fail immediately unless it anticipated near-term progress against it.

There is a plausible reading of all these signals together: OpenAI may be preparing to describe Spud — or whatever it is ultimately named — as the first model that qualifies as AGI under their internal definition. OpenAI's definition of AGI, as stated in their public-facing documentation, is a system that can perform most economically valuable work at roughly human level. That bar is specific, and the organizational rename to "AGI Deployment" suggests it may be closer than the external conversation assumes.

That said: OpenAI has structured financial incentives to make this declaration. Their partnership with Microsoft includes provisions where certain contractual terms shift if OpenAI achieves AGI. Skepticism about the timing and framing of any such declaration is appropriate. The capability question and the labeling question are separate. A model can be genuinely impressive without meeting the definition of AGI that most AI researchers would accept.

For context, the current ARC-AGI-3 scores across frontier models, as reported in our earlier analysis of the ARC-AGI-3 launch, suggest that on the most rigorous public test of general reasoning, the gap between current models and human performance remains enormous. That context matters when evaluating AGI framing from any lab.

"We covered the technical breakdown of what ChatGPT actually proved in Tao's paper in a separate, more detailed analysis — [read it here]."

My Take

The Tao story and the Spud story are being reported as separate items. They should not be. They are the same story told from two different angles — one from the frontier lab pushing capability, one from the frontier scientist using it. When you look at them together, a specific pattern emerges that is harder to dismiss than either piece alone.

Here is what the skeptic in me notices about the Spud announcement: we know almost nothing concrete. No parameter count, no architecture details, no benchmarks, no capability demonstrations. We have one internal memo's worth of Altman-isms and an org chart rename. In isolation, that is worth exactly as much as any other CEO's pre-launch hype — which is to say, approximately zero. OpenAI has been in "Code Red" mode since at least December 2025, racing against Anthropic and Google. Companies in Code Red mode have strong incentives to announce aggressively.

What makes me take this more seriously than usual is the Tao data point. Tao is not employed by OpenAI. He is not an investor. He has spent years providing careful, skeptical, incremental assessments of AI capability in mathematics — and his assessments have consistently proven more accurate than either the maximalist boosters or the minimalist skeptics. When the greatest living mathematician publishes a paper with a core proof credited to an AI model and describes it as unremarkable — as something that now "saves more time than it wastes" — that is signal, not noise.

The honest caveat is this: I do not know if Spud will deliver what the internal framing promises. I do not know if the "accelerate the economy" language reflects actual capability or strategic positioning ahead of OpenAI's IPO. Both can be partly true. What I am more confident in is the direction of travel. The Tao paper is a peer-reviewed, publicly verifiable data point from someone with nothing to gain from overstatement. That, more than any Altman memo, is the piece of evidence I would weight most heavily when thinking about where this is all going.

Key Takeaways

OpenAI's "Spud" model completed pre-training on approximately March 24, 2026. Release expected within weeks.
Sam Altman described Spud internally as capable of "really accelerating the economy" — unusually strong language for a product release.
Sora has been discontinued. Its compute has been redirected to Spud. A reported Disney deal is now dead.
The Sora team pivots to world models for robotics — a direct move into Google DeepMind and Nvidia's territory.
Terence Tao published a math paper on March 23 where ChatGPT Pro carried one half of a core proof.
AlphaEvolve (Google DeepMind) acted as the numerical confirmation layer; ChatGPT acted as the proof-generation layer.
Tao's 2023 prediction — AI as "trustworthy co-author" by 2026 — appears to be arriving ahead of schedule.
OpenAI's org chart now includes "AGI Deployment" as a formal category for the first time.
ARC-AGI-3, the most rigorous public reasoning benchmark, shows frontier models at 0.37% vs. human 100% — the gap remains large regardless of internal framing.

FAQ

What is OpenAI Spud?

Spud is the internal codename for OpenAI's next major language model. It completed pre-training in late March 2026 and is expected to be released publicly within weeks. Whether it will be named GPT-5.5, GPT-6, or something entirely different has not been confirmed by OpenAI.

Why did OpenAI shut down Sora?

Sora was consuming a large share of OpenAI's compute resources. The company redirected that compute toward training Spud and building its planned super app. Video generation has been removed from OpenAI's product roadmap entirely. The Sora team has been reassigned to world models and robotics research.

What did Terence Tao's math paper have to do with AI?

Tao's March 2026 paper on Lebesgue constants explicitly credits ChatGPT Pro with generating a duality-based proof for a key inequality. Google's AlphaEvolve was used earlier in the process to numerically confirm the inequality's likely truth. Tao handled the second half of the problem himself.

Is OpenAI about to announce AGI?

OpenAI's org chart now formally includes "AGI Deployment" as a category — a first. The internal framing around Spud uses unusually systemic language. However, on the most rigorous public benchmark (ARC-AGI-3), frontier models score 0.37% against human 100%. Any AGI declaration would need to be evaluated against both OpenAI's specific internal definition and the broader scientific consensus on what AGI means.

What is AlphaEvolve and how is it different from ChatGPT?

AlphaEvolve is Google DeepMind's evolutionary coding agent, built on Gemini. It generates many candidate solutions, tests them against a fitness function, and refines the best performers — mimicking biological evolution. It excels at numerical optimization and exploring large solution spaces. ChatGPT is a general language model that can recognize problem types and generate proofs or solutions based on pattern recognition across its training data. In the Tao paper, both were used for different stages of the same problem.

How many Erdős problems has AI solved?

Tao maintains a GitHub wiki tracking AI progress on Erdős problems. By early 2026, eight problems had seen meaningful autonomous AI progress. Six additional cases involved AI finding and building on existing research. GPT-5.2 Pro reportedly solved three Erdős problems in a single week shortly after its January 2026 release. Tao has noted that AI is particularly well-suited for the "long tail" of problems where standard techniques apply but no human had systematically attempted them.

What is OpenAI's super app?

OpenAI is building a unified platform that combines ChatGPT, Codex (coding agent), Atlas (web browsing), and other tools into one interface. Spud is expected to be the model powering this combined experience. The goal is to allow users to move between conversational AI, code generation, web research, and agentic task execution in a single session without switching tools.

The honest position is this: Spud may deliver everything Altman's internal framing suggests, or it may be another capable but incremental model that gets AGI-adjacent marketing applied to it ahead of an IPO. What is harder to dispute is the Tao data point — because Tao has no financial incentive to overstate AI's contribution to his own work, and every professional incentive to understate it. His willingness to credit ChatGPT with a core proof in a peer-reviewed paper is a more meaningful signal than any number of lab announcements. The thing to watch is not the Spud launch itself. It is what happens when Spud reaches the researchers, mathematicians, and knowledge workers who will actually use it — and whether the results they report match the framing that preceded it.