Terence Tao Let ChatGPT Prove Part of His New Math Paper. Here Is Exactly What Happened.

OpenAI AI Revolution Math & AI Benchmarks

Handwritten mathematical inequalities on graph paper with a faint digital cursor overlay, representing AI-assisted mathematical proof

Paper Date
March 23, 2026 — arXiv

Tao's 2023 Prediction
"2026-level AI will be a trustworthy co-author"

Sept 2024 Assessment
AI = "mediocre but not incompetent grad student"

March 2026 Assessment
AI "now saves more time than it wastes"

One inequality. That is the precise scope of what ChatGPT Pro contributed to Terence Tao's latest paper. Not the theorem. Not the framework. Not the problem statement. One specific inequality that Tao had already isolated as a self-contained subproblem — and could not prove himself.

That framing matters. Because the coverage of this event has split into two camps that are both wrong: one claiming AI just solved a Fields Medal-level problem, the other dismissing it as nothing because the AI "only" handled a sub-problem. The actual story sits in neither place — and it is more interesting than either camp's version.

On March 23, 2026, Tao posted Local Bernstein theory, and lower bounds for Lebesgue constants to arXiv (tagged math.CA and math.CV). In the paper and its accompanying blog post, he explicitly credits ChatGPT Pro with a duality-based proof. Then he goes a step further: he describes using AI for "several other secondary tasks, such as literature review, proofreading, and generating pictures" — and adds that these applications have "matured to the point where using them is almost mundane." That second sentence is the one worth reading twice.

What actually happened in the paper

The paper addresses a classical problem in approximation theory, originally posed by Paul Erdős, about the growth rate of Lebesgue constants for Lagrange interpolation at arbitrary node sets. Most of the paper is, by Tao's own description, standard work — elegant modifications of classical arguments from Bernstein, Boas, Duffin, and Schaeffer. The core techniques are well-established.

One bound resisted him. He had reduced the main problem to a toy inequality about trigonometric polynomials — a simpler, self-contained statement that, if proved, would unlock the larger argument. He suspected the extremizer (the function that makes the inequality tight) was a sinusoid, but could not find a rigorous proof of the bound in that case.

He turned to Google DeepMind's AlphaEvolve first. It confirmed numerically that sinusoids appeared to be the extremizer — providing evidence that the approach was worth pursuing — but could not produce a proof. Then he fed the toy inequality to ChatGPT Pro.

According to Tao's blog post, the model identified the problem as one in approximation theory and returned a duality-based argument built on the Fourier expansion of the square wave. Tao then extended that proof to functions of global exponential type — replacing the Fourier manipulations with contour-shifting arguments — and completed the paper. In his words, he was "led to a way to prove the toy model's integral bound by splitting it into two lower bounds," and ChatGPT proved the first while he proved the second.

He also explicitly credited the Nevanlinna two-constant theorem as the relevant classical result, adding that he "was not previously aware" of it before the model's response — suggesting ChatGPT located a theorem from the literature that Tao, for all his breadth, did not have in immediate recall.

What AlphaEvolve did — and what it could not do

AlphaEvolve is Google DeepMind's evolutionary agent, built on Gemini. It works by generating large numbers of candidate solutions to a problem, evaluating each against a verifiable fitness function, and iterating on the best performers. This approach — closer to search than language modeling — is well-suited for numerical verification: you can write a fitness function that checks whether a proposed extremizer satisfies the inequality, and let AlphaEvolve search over function classes.

That is exactly what it did here. It searched over candidate functions and found that sinusoids consistently appeared to be the extremizer across many cases — building numerical confidence that the conjecture was correct. But numerical confirmation is not a proof. Mathematics requires a rigorous argument that holds for all cases, not just the sampled ones. AlphaEvolve could not produce that argument.

This distinction — between numerical plausibility and rigorous proof — is not a minor technicality. It is the entire difference between "this looks right" and "this is proven." AlphaEvolve is powerful for the former. The proof step required something different: recognizing the problem type and knowing which classical result to apply. That is where ChatGPT operated.

What ChatGPT actually proved, technically

The duality argument that ChatGPT produced works roughly as follows. The toy inequality involves bounding the L1 norm of a trigonometric polynomial against a weighted combination of its coefficients. Duality in functional analysis allows you to reframe this as a problem about the dual space — instead of bounding the polynomial directly, you find the "worst-case" linear functional that tests it. The Fourier expansion of the square wave provides the relevant dual object, and the Nevanlinna two-constant theorem (a result from complex function theory about conformal mappings) constrains how large that dual object can be.

None of these are new mathematical ideas. Duality in function spaces is a foundational technique. The Nevanlinna theorem has been known for decades. The Fourier expansion of the square wave is undergraduate material. What ChatGPT did was recognize that this specific problem was the type where those techniques apply together — and assemble them correctly.

That is pattern matching at a high level. It is not the creation of new mathematics. But it is also not trivial. The space of applicable techniques for any given problem is large, and knowing which combination to reach for — and retrieving a theorem from the literature that even Tao did not have immediately in mind — is a genuinely useful capability. The proof was correct. Tao verified it. It appeared in a peer-reviewed paper.

Three-stage diagram showing numerical search, language model, and human mathematician working in sequence on a mathematical problem

Tao's track record of AI assessment: September 2024 to March 2026

The reason Tao's statements on this subject carry unusual weight is that his assessments have been consistently calibrated — neither dismissive nor credulous — and they have updated visibly in response to evidence rather than maintaining a fixed position.

In September 2024, he described working with AI on mathematical problems as similar to advising "a mediocre, but not completely incompetent, graduate student." That is a specific metaphor: useful for grunt work and first-pass attempts, but requiring constant supervision and producing results that needed significant correction. He was skeptical about autonomous mathematical capability.

By early March 2026 — at an IPAM conference — his assessment had shifted. He described AI tools in mathematics as now saving "more time than they waste" and said the technology was "ready for primetime." That is a qualitative change in evaluation from someone who had previously been measured in his enthusiasm.

The March 23 paper represents a concrete, peer-reviewed data point behind that updated assessment. His 2023 prediction — written in a piece on AI-assisted mathematics — stated that "2026-level AI, when used properly, will be a trustworthy co-author in mathematical research." That prediction is now, as of this paper, partially fulfilled. He made the prediction with appropriate hedges ("when used properly"). The paper reflects those same hedges in practice: he directed the work, isolated the subproblem, verified the proof, and extended it. The AI filled one specific gap.

The Erdős problem pattern: what AI is and is not solving

Tao maintains a running record of AI progress on Erdős problems on his blog. The pattern that emerges from that record is instructive. AI models have made meaningful autonomous progress on a number of problems — but the problems where progress occurred tend to share a specific characteristic: they are cases where existing proofs or near-proofs existed in the literature, and the AI's main contribution was either locating those results or connecting known techniques to the problem in question.

Tao described this pattern directly on Mastodon: AI is well-suited for what he calls the "long tail" of problems — cases where standard techniques apply but no human had bothered to systematically attempt them. The hard end of mathematics — problems requiring new conceptual structures, new definitions, genuinely novel insights — remains out of reach. The Bernstein paper is a clear illustration: the AI's contribution used established techniques. The novel extension to exponential-type functions was Tao's.

Separately, in December 2025, Tao published a detailed account of how Erdős Problem #1026 was solved — through a combination of AI-assisted literature search, online collaboration, and a large language model that assembled a coherent proof from existing results. That case also followed the same pattern: the AI's contribution was synthesis and retrieval, not novel mathematical invention.

The emerging division of labor in AI-assisted mathematics

What the Bernstein paper illustrates concretely is a workflow that Tao and others have been describing in more abstract terms for two years. It has four components, each handled by a different tool:

Problem decomposition and direction — The human mathematician identifies what the core obstacle is, isolates it as a self-contained subproblem, and decides what kind of proof is needed. This is where mathematical judgment operates. Tao reduced a hard problem about Lebesgue constants to a toy inequality about trigonometric polynomials. No AI in this pipeline made that choice.

Numerical exploration — AlphaEvolve searches over large solution spaces and confirms whether a conjectured answer is plausible. This is the "is it worth pursuing" layer. It replaces hours of manual computation but produces no rigorous guarantees.

Technique identification and proof generation — LLMs like ChatGPT recognize problem types, retrieve relevant classical results, and assemble proof sketches. The reliability of this layer depends entirely on whether the problem falls within the distribution of what the model has seen — which is why it works on sub-problems involving established techniques and fails on genuinely novel ones.

Formal verification — Tools like Lean check proofs line by line. Tao has noted that LLM-generated proofs are "prone to hallucination" and that formal verification is preferable when feasible. For the Bernstein paper, Tao verified the ChatGPT proof himself rather than through a formal verifier — but the option exists and is increasingly practical.

What is notable about this division of labor is that it is already functioning in published research, not just in toy demonstrations. Tao's paper is peer-reviewed, appeared on arXiv, and credits the AI contribution explicitly. That normalization — treating AI as a citable contributor — is new.

The OpenAI Spud connection: why timing matters

The Tao paper landed the same week that OpenAI internally confirmed its next major model — codenamed Spud — had completed pre-training. Sam Altman described it to staff as something that can "really accelerate the economy," according to reporting in The Information. These are not the same story, but they share a context worth acknowledging.

The model Tao used is not Spud — Spud reportedly finished training on approximately March 24, one day after the paper was posted. The most likely candidate is GPT-5.4 Pro or an equivalent current-generation model. What matters is that the capability demonstrated in the paper — retrieving a non-obvious theorem, assembling a correct duality proof, contributing a verifiable argument to publishable research — exists in models already deployed, before whatever Spud represents arrives.

For context on how the broader AI capability race sits right now: as we covered in our analysis of the ARC-AGI-3 benchmark launch, the best frontier models score 0.37% on a test where humans score 100% — suggesting that on the most general reasoning tasks, the gap remains very large. Mathematical sub-problems with established technique matches are not the same as general reasoning. Both data points are true simultaneously.

My Take

The most important sentence in Tao's paper is not the one crediting ChatGPT with a proof. It is the one where he describes using AI for literature review, proofreading, and figure generation as "almost mundane." That word — mundane — is doing more work than any benchmark score or capability claim from a lab. It describes a shift in the baseline experience of working with these tools, from something requiring careful management and significant error correction to something that functions well enough that a world-class mathematician no longer finds it worth remarking on.

The skeptical read of the Bernstein paper — that ChatGPT did pattern matching on a well-scoped sub-problem using classical techniques, and that Tao would have found the proof eventually anyway — is technically correct and also substantially beside the point. "Would have found it anyway" describes a large fraction of useful tools. The question is whether the tool compresses the timeline meaningfully. Tao's updated assessment suggests it does.

What I find harder to assess is how quickly this workflow scales beyond mathematicians like Tao — people with the judgment to decompose problems correctly, verify AI outputs rigorously, and extend AI-generated proof sketches to the cases that require novel mathematics. The workflow is real. The question is whether it requires a Fields Medalist to operate it, or whether it will become accessible to researchers at other levels. That question will be answered empirically over the next two years, not by anything any lab announces.

The Spud timing is a legitimate data point, not because Spud was involved in the paper, but because it establishes that the capabilities Tao used were available in current-generation models before whatever the next generation delivers. If current models can contribute to published research, the reasonable prior is that more capable models will do so more reliably and on harder problems. That is not a guarantee. But it is a more defensible prediction than most of what gets said about AI capabilities in any given week.

Key Takeaways

Tao's March 23 arXiv paper credits ChatGPT Pro with proving one of its key inequalities — a duality-based argument using the Nevanlinna two-constant theorem.
AlphaEvolve confirmed the inequality numerically. ChatGPT provided the rigorous proof. Tao extended that proof and handled the second half himself.
The AI's contribution was pattern matching and literature retrieval — not new mathematics. Tao verified the proof and confirmed its correctness before including it.
Tao's 2023 prediction that "2026-level AI will be a trustworthy co-author" appears to be arriving on or ahead of schedule.
His assessment shifted from "mediocre grad student" (September 2024) to "saves more time than it wastes, ready for primetime" (March 2026).
He also described AI for literature review, proofreading, and figures as "almost mundane" — a normalization of AI-assisted workflow in research.
The model used was not Spud — most likely GPT-5.4 Pro. Spud completed pre-training one day after the paper was posted.
ARC-AGI-3 scores (0.37% for best frontier models vs. human 100%) show the gap on general reasoning remains large — mathematical sub-problems with established technique matches are a different, narrower domain.

FAQ

Did ChatGPT actually solve a math problem that Terence Tao could not?

Partially, and with important caveats. Tao had already reduced the main problem to a simpler toy inequality. He fed that specific sub-problem to ChatGPT, which returned a correct duality-based proof using the Nevanlinna two-constant theorem — a classical result Tao had not had in immediate recall. Tao verified the proof, then extended it to handle the more general case that the paper required. The AI solved one precisely scoped piece; Tao solved the rest and directed the entire work.

What is the Nevanlinna two-constant theorem?

It is a result from complex analysis about the behavior of bounded analytic functions in a strip, relating the maximum modulus of a function on two parallel lines to its behavior in between via conformal mapping. In the context of the Bernstein paper, it provides a bound on how large the dual object in the duality argument can be — completing the proof of the toy inequality.

What is AlphaEvolve and how is it different from ChatGPT?

AlphaEvolve is Google DeepMind's evolutionary agent, built on Gemini. It generates many candidate solutions, evaluates them against a fitness function, and iterates. It is suited for numerical search and verification — confirming whether a conjecture is plausible across many cases. ChatGPT is a language model that recognizes problem types and retrieves applicable techniques from its training distribution. In the Bernstein paper, AlphaEvolve confirmed the approach numerically and ChatGPT provided the rigorous proof step.

How has Tao's assessment of AI in mathematics changed over time?

In 2023 he predicted that 2026-level AI would be a "trustworthy co-author." In September 2024 he described current AI as like "a mediocre, but not completely incompetent, graduate student." By early March 2026, at an IPAM conference, he said AI now "saves more time than it wastes" and is "ready for primetime." The March 23 paper represents a concrete, peer-reviewed data point consistent with that updated assessment.

What is the connection to OpenAI's Spud model?

The model Tao used was not Spud — Spud reportedly completed pre-training on March 24, one day after the paper was posted. The connection is contextual: the capabilities demonstrated in the paper existed in currently deployed models before Spud arrives. If current models can contribute to published research in mathematics, the prior for more capable models doing so more reliably is reasonable, though not guaranteed.

What types of math problems can AI currently solve autonomously?

Based on Tao's observations and the Erdős problem record, AI performs best on the "long tail" — problems where standard techniques from the existing literature apply, but no human had systematically attempted to deploy them. Problems requiring genuinely new mathematical structures, new definitions, or novel conceptual insights remain beyond current autonomous AI capability. The Bernstein paper fits the former category: the proof used established techniques; the novel extension was Tao's work.

Where can I read Tao's actual blog post and the arXiv paper?

Tao's blog is at terrytao.wordpress.com — the March 23 post is the most recent as of this writing. The arXiv paper is tagged math.CA and math.CV under his name; search "Tao Bernstein Lebesgue 2026" on arxiv.org to locate it directly.

The honest caveat is this: one paper — even one authored by the greatest living mathematician — is a data point, not a trend line. The Bernstein proof used established techniques on a well-scoped problem that Tao had already isolated. The cases that would represent a genuine qualitative shift — AI contributing novel mathematical insight to problems that resist standard technique application — have not yet appeared in peer-reviewed work. That bar may be reached. It may not. Tao's track record of calibrated assessment suggests his updated optimism is worth taking seriously. It does not guarantee he is right.

Terence Tao Let ChatGPT Prove Part of His New Math Paper. Here Is Exactly What Happened.

What actually happened in the paper

What AlphaEvolve did — and what it could not do

What ChatGPT actually proved, technically

Tao's track record of AI assessment: September 2024 to March 2026

The Erdős problem pattern: what AI is and is not solving

The emerging division of labor in AI-assisted mathematics

The OpenAI Spud connection: why timing matters

My Take

FAQ

Did ChatGPT actually solve a math problem that Terence Tao could not?

What is the Nevanlinna two-constant theorem?

What is AlphaEvolve and how is it different from ChatGPT?

How has Tao's assessment of AI in mathematics changed over time?

What is the connection to OpenAI's Spud model?

What types of math problems can AI currently solve autonomously?

Where can I read Tao's actual blog post and the arXiv paper?

Posted by Vinod Pandey

Post a Comment

0 Comments

Most Popular

What Is DeepSeek TUI? The Open-Source Terminal Coding Agent That Hit 10,000 GitHub Stars in Days

Anthropic Can Now Read Claude's Internal Thoughts — And What It Found Changes Everything About AI Safety

Google Remy vs Anthropic Orbit: The Shift From AI Assistant to AI Agent, Explained (2026)

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

About Me

Footer Menu Widget

Contact form

Terence Tao Let ChatGPT Prove Part of His New Math Paper. Here Is Exactly What Happened.

What actually happened in the paper

What AlphaEvolve did — and what it could not do

What ChatGPT actually proved, technically

Tao's track record of AI assessment: September 2024 to March 2026

The Erdős problem pattern: what AI is and is not solving

The emerging division of labor in AI-assisted mathematics

The OpenAI Spud connection: why timing matters

My Take

FAQ

Did ChatGPT actually solve a math problem that Terence Tao could not?

What is the Nevanlinna two-constant theorem?

What is AlphaEvolve and how is it different from ChatGPT?

How has Tao's assessment of AI in mathematics changed over time?

What is the connection to OpenAI's Spud model?

What types of math problems can AI currently solve autonomously?

Where can I read Tao's actual blog post and the arXiv paper?

Posted by Vinod Pandey

You may like these posts

Post a Comment

0 Comments

Most Popular

What Is DeepSeek TUI? The Open-Source Terminal Coding Agent That Hit 10,000 GitHub Stars in Days

Anthropic Can Now Read Claude's Internal Thoughts — And What It Found Changes Everything About AI Safety

Google Remy vs Anthropic Orbit: The Shift From AI Assistant to AI Agent, Explained (2026)

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

About Me

Footer Menu Widget

Contact form