DeepMind Aletheia: How Google's AI Solved 6 Open Math Problems (FirstProof Explained)

DeepMind's Aletheia — Google's AI research agent — just solved six open PhD-level math problems that no other AI had ever touched. It worked autonomously, under a real deadline, with zero human help during the solving process. The results from the FirstProof challenge are already making working mathematicians uncomfortable — in the best possible way. Here's what actually happened, and why this moment is different from every "AI beats math benchmark" headline before it.

DeepMind Aletheia and the FirstProof Challenge: Hard Math vs Open Math

The headline sounds dramatic until you separate two ideas: "difficult" and "unknown."

A hard contest problem is difficult, but the path to a solution usually exists inside a shared toolbox. Someone clever can combine known tricks under pressure and get it done. That's amazing, but it's still a sprint on a marked track.

Open research math is different. Sometimes only a few dozen people on Earth fully understand what's being asked. There may be no agreed solution strategy. There may not even be a solution. You can burn months chasing something that turns out to be impossible, or true for a boring reason no one noticed, or true but only after you invent a new lens to look through.

That's the context for the FirstProof challenge, a set of 10 frontier problems drawn from modern mathematical research. DeepMind's Aletheia solved six of them (Problems 2, 5, 7, 8, 9, and 10), fully correctly, within the official deadline, with zero human intervention during the solving run.

On-screen text explains the FirstProof challenge format and highlights that Aletheia solved 6 out of 10 problems.

A detail that matters: the remaining four problems did not get "pretty-looking" fake proofs. Aletheia either reported no solution found, or said nothing until time expired. In math, that restraint is not weakness. It's what keeps the whole project from turning into a time-wasting machine.

One problem, labeled Problem 7, had been open for years. People familiar with it said no other AI system got close. Aletheia solved it, and the proposer of the problem personally confirmed the reasoning.

And then there's the line from Terence Tao that hit like a cymbal crash: "AI has become my junior co-author." It's casual, but it carries a serious implication about where day-to-day research is headed.

When a system can search, fail, backtrack, and still land on correct new proofs, it's not doing "math homework." It's doing research behavior.

If you want a broader read on how messy the word AGI has become lately, this related piece puts some of the competing claims side by side: Inside Integral AI's first AGI-capable model.

What Makes FirstProof Harder Than the IMO — And Why Aletheia's Result Matters

The International Mathematical Olympiad is the famous benchmark most people know. It's brutal, and it rewards deep skill. Still, it's a closed world in an important way: you're expected to solve problems using known techniques, with creativity in how you combine them.

FirstProof is more like being dropped into a foggy forest with a compass that might be wrong.

These are research-level questions from active areas of mathematics. In some cases, you don't just need technique, you need a willingness to try an idea, realize it's dead, throw it out, and keep going anyway. That "keep going" part is where humans are limited by time, mood, ego, and basic fatigue. A machine agent doesn't have those limits. It has a compute budget, a clock, and rules.

Here's what makes the FirstProof result feel heavier than "AI got better at math":

Aletheia had to work through long chains of reasoning, including failed attempts, and still produce correct proofs. DeepMind ran two versions of Aletheia (built on slightly different underlying models) and used cross-checking between them to reduce the odds of a silent error.

That cross-check idea sounds simple, but in research math it's everything. A single convincing, wrong proof can burn weeks of an expert's life. The cost isn't embarrassment, it's lost human time.

For the primary source, DeepMind documented the FirstProof run in an arXiv paper: Aletheia tackles FirstProof autonomously (arXiv PDF).

Diagram shows Aletheia as a long-horizon research agent built on Gemini Deep Think, designed for sustained reasoning and self-correction.

How DeepMind Aletheia Works: A Research Agent That Argues With Itself

Aletheia is not framed as a chatbot that "knows math." It's described as a long-horizon research agent built on top of Gemini 3 Deep Think. The key is that it's designed to do work that takes time, where thousands of attempts can fail before one path clicks.

DeepMind's approach to reliability is the part worth sitting with. Instead of hoping the model "behaves," the system creates internal conflict on purpose. Two roles constantly push against each other:

The generator proposes ideas aggressively. It tries strategies, makes conjectures, explores routes that might be wrong, and keeps moving.
The verifier attacks every step. It checks logic, looks for cracks, and rejects anything that doesn't hold exactly.

So the system spends a lot of its time in a kind of structured argument. That might sound inefficient, but it matches what careful mathematicians already do in their head, just with more stamina and less ego.

The most important behavioral point shows up when things go badly. When Aletheia couldn't solve a problem, it didn't bluff. It either said "no solution found" or ran out the clock. DeepMind even says they're willing to solve fewer problems if that's the price of not producing nonsense.

That may sound conservative, but it's also how you build a tool people can actually trust.

DeepMind also released full interaction logs, including failed attempts and wrong turns. If you want to see what "research behavior" looks like in raw form, those logs matter as much as the final proofs: Aletheia interaction logs on GitHub.

Problem 7: the proof that made people uncomfortable (in a real way)

Problem 7 sits where algebraic topology and differential geometry overlap, which is a polite way of saying: you can lose a weekend just parsing the setup.

In simplified terms, the question asks whether a certain kind of discrete group can appear as the fundamental group of a compact, boundaryless manifold, under strict conditions on the universal cover. If that sentence feels like a filter, it is.

Aletheia didn't just answer. It proved the answer is "no" in two different ways, and that's the part that feels less like pattern matching and more like mathematical taste.

The video displays a plain-language summary of Problem 7, describing its link to manifolds, universal covers, and group actions.

The first proof: a clean contradiction using Lefschetz numbers

The first proof is almost rude in how direct it is.

Using the assumption about the universal cover (described as rationally acyclic in the narration), Aletheia computes a Lefschetz number tied to an element of order two. Then it forces the same quantity to be both non-zero and zero:

On one side, the argument says the Lefschetz number must be non-zero. On the other side, because the group action is free with no fixed points, the Lefschetz number must be zero. That creates an impossible equality, described bluntly as:

0 = ±1

Contradiction, done.

A highlighted contradiction shows a Lefschetz-number argument forcing an impossible equality, summarized as "0 equals plus or minus 1."

What's unsettling (in a good way) is how little structure the proof uses. It doesn't depend heavily on the geometry that makes the problem look intimidating. It also proves something stronger than the original question: in this setting, no discrete group containing torsion elements can work at all.

That "stronger than asked" move is something mathematicians recognize instantly. It's a sign the solver isn't just chasing the target, it's seeing the shape behind it.

The second proof: fully geometric, different tools, same collision

Then Aletheia offers a second proof that goes the opposite direction. Instead of staying abstract, it leans into geometry:

It constructs an equivariant map from the universal cover to a symmetric space associated with the group. Then it compares Lefschetz numbers on both sides.

On the universal cover side, the group action is free, so the Lefschetz number vanishes.
On the symmetric space side, the Cartan fixed point theorem forces fixed points, which pushes the Lefschetz number to be non-zero.

Different machinery, same contradiction, same "no."

One expert reaction mentioned in the narration was that this looked like the first time they'd seen an AI combine multiple deep theorems in a way that didn't feel stitched together.

It wasn't cheap: the compute story matters

To solve Problem 7, Aletheia used 16 times the reasoning budget DeepMind used on a prior flagship math result (mentioned as the "Erdos 151 problem" in the narration). Whatever you call the comparison point, the punchline is simple: this took sustained exploration.

DeepMind even visualized the reasoning cost over time, showing repeated dead ends, backtracking, and persistence.

A chart of reasoning effort over time shows repeated spikes, indicating dead ends, backtracking, and renewed attempts before a final proof is found.

Another solved problem: a very "human" move in number theory and representation theory

Problem 7 gets the spotlight, but the other solves weren't soft targets.

One of the problems Aletheia solved came from number theory and representation theory. In broad terms, it involved matrix groups over non-Archimedean local fields, and the existence of a universal Whittaker function that guarantees a certain integral never vanishes across paired representations.

Even in a simplified retelling, the structure of the proof is the interesting bit.

Aletheia's approach starts with a choice that changes the whole board. It picks a particular Whittaker function that compresses the integration domain into a compact set, and in the process removes an entire complex parameter that would otherwise hang over the argument. With that simplification in place, the problem reduces to whether a finite functional can be zero.

From there, Aletheia runs a contradiction:

Assume the integral vanishes for all representations. Then use finite Fourier analysis to show that this assumption would force the representation to have invariant vectors under a subgroup larger than its conductor allows. That clashes with the definition of the conductor itself, so the assumption collapses.

a proof sketch using a carefully chosen Whittaker function and a contradiction involving Fourier analysis and representation conductors.

The narration points out what human mathematicians immediately notice: that initial "pick the right Whittaker function" move feels like experience. It's the kind of simplifying decision people learn after years of wrestling with similar integrals.

For extra context on how DeepMind frames the jump from competition math to research math, their earlier related paper is also on arXiv: Towards Autonomous Mathematics Research (arXiv PDF).

Why people are calling this the end of "manual" math research

This part doesn't mean mathematicians are out of a job next week. It means the workflow changes.

For centuries, math progress has depended on humans doing everything by hand: reading, guessing, trying, failing, rewriting, checking, getting stuck, and sometimes giving up. The bottleneck has always been time and attention.

A research agent doesn't get tired. It doesn't cling to a beautiful idea that's wrong. It doesn't need a walk to cool off. It just keeps pushing until it finds a proof, or until the budget runs out.

That's why the "junior co-author" line lands. It's not about replacement, it's about throughput. If even a small slice of research time shifts from grind work to supervision and taste, the pace changes.

Also, there's a quiet second-order effect: if AI systems can propose and verify proofs faster than humans can read them, then human review becomes the new bottleneck. That's not a math problem, it's a social and tooling problem.

Google's energy play: because long-horizon reasoning burns real power

The story took an unexpected turn, but it makes sense once you think about it.

At the same time as the Aletheia news, Google revealed plans for a major data center complex south of Minneapolis, tied to a renewable-heavy power buildout and what's described as the world's largest battery storage system.

The project details, as described:

About 1.4 GW of wind
200 MW of solar
A 300 MW iron-air battery capable of delivering power for up to 100 hours (more than 4 days)

A clean energy infographic lists wind, solar, and a 300 MW iron-air battery providing up to 100 hours of storage for a Minnesota data center.

Unlike common lithium-ion grid batteries that often target 4 to 8 hours of storage, this system is meant for multi-day gaps, the kind you get in major weather events.

The iron-air battery tech comes from Form Energy and works through reversible rusting: discharging is iron reacting with oxygen to release energy, charging reverses the process.

This isn't just a "green PR" side quest. It's part of the compute story. Aletheia's Problem 7 run alone required 16 times the reasoning budget of a previous big math milestone. If long-horizon agents become normal, power becomes a hard constraint, not an afterthought.

For reporting on the Minnesota deal and the 100-hour system, here are two useful sources: American Public Power Association coverage of the 300 MW iron-air system and Fortune's report on the 100-hour battery buildout.

What I learned while sitting with this (my honest take)

I've seen enough AI demos to have a reflex now: wait for the fine print. So at first, I treated this like another "look how smart the model is" cycle that would fade once people checked the work.

Then I kept coming back to two details.

First, it solved six open problems under a deadline, and it didn't pretend to solve the rest. That sounds small, but it's the difference between a tool you can hand to an expert and a tool that wastes experts.

Second, the idea of releasing the interaction logs, dead ends included, changes the vibe. It's a very different feeling to watch a system take wrong turns, back up, and keep going. It feels less like magic, and more like process. Weirdly, that makes it more believable, not less.

Most of all, the energy angle snapped something into focus for me. Long-horizon reasoning isn't just "a smarter model." It's an industrial activity. If we want AI agents that think for hours or days, we're also signing up for the power, storage, and grid decisions that come with that.

Conclusion: mathematical AGI is starting to look less like a slogan

Aletheia's FirstProof results don't settle every AGI argument, but they do change what's plausible. Solving open research problems, checking the work, refusing to bluff, and doing it within real constraints looks like the early shape of machine research.

At the same time, the Minnesota energy buildout hints at the next bottleneck: not ideas, but electricity. If this is where AI is going, AGI won't be just a software story. It'll be a systems story, from proofs to power plants.

What part feels bigger to you, the math itself, or the fact that someone is already building the infrastructure to run this kind of thinking at scale?

Frequently Asked Questions About DeepMind Aletheia

What is DeepMind Aletheia?

DeepMind Aletheia is Google's AI research agent designed for long-horizon mathematical reasoning. It autonomously solved 6 open PhD-level problems in the FirstProof challenge with no human help.

What is the FirstProof challenge?

FirstProof is a set of 10 frontier math research problems drawn from active areas like algebraic topology and number theory. These are unsolved problems — not textbook exercises — that only a handful of experts worldwide fully understand.

How many problems did Aletheia solve?

Aletheia solved 6 out of 10 problems (Problems 2, 5, 7, 8, 9, and 10), all correctly verified, within the official deadline.

Is DeepMind Aletheia the same as mathematical AGI?

DeepMind hasn't officially claimed AGI, but Aletheia's ability to solve open research problems — not just competition math — is widely seen as a major milestone toward machine-level mathematical research.

What model does Aletheia run on?

Aletheia is built on top of Gemini 3 Deep Think, designed specifically for sustained, long-horizon reasoning tasks.

Why did Problem 7 get so much attention?

Problem 7 had been open for years. Aletheia not only solved it — it provided two independent proofs using completely different methods. The problem's original proposer personally verified the solution.

DeepMind Aletheia: How Google's AI Solved 6 Open Math Problems (FirstProof Explained)

DeepMind Aletheia and the FirstProof Challenge: Hard Math vs Open Math

What Makes FirstProof Harder Than the IMO — And Why Aletheia's Result Matters

How DeepMind Aletheia Works: A Research Agent That Argues With Itself

Problem 7: the proof that made people uncomfortable (in a real way)

The first proof: a clean contradiction using Lefschetz numbers

The second proof: fully geometric, different tools, same collision

It wasn't cheap: the compute story matters

Another solved problem: a very "human" move in number theory and representation theory

Why people are calling this the end of "manual" math research

Google's energy play: because long-horizon reasoning burns real power

What I learned while sitting with this (my honest take)

Conclusion: mathematical AGI is starting to look less like a slogan

Frequently Asked Questions About DeepMind Aletheia

Posted by Vinod Pandey

Post a Comment

1 Comments

Most Popular

DeepSeek V4 API Pricing Explained: Pro vs Flash Cost Breakdown (2026)

Anthropic Mythos Cybersecurity: What Project Glass Wing Found in 30 Days

Google Antigravity 2.0 Broke Thousands of Developer Setups Overnight. Here's Everything That Changed.

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

Footer Menu Widget

Contact form

DeepMind Aletheia: How Google's AI Solved 6 Open Math Problems (FirstProof Explained)

DeepMind Aletheia and the FirstProof Challenge: Hard Math vs Open Math

What Makes FirstProof Harder Than the IMO — And Why Aletheia's Result Matters

How DeepMind Aletheia Works: A Research Agent That Argues With Itself

Problem 7: the proof that made people uncomfortable (in a real way)

The first proof: a clean contradiction using Lefschetz numbers

The second proof: fully geometric, different tools, same collision

It wasn't cheap: the compute story matters

Another solved problem: a very "human" move in number theory and representation theory

Why people are calling this the end of "manual" math research

Google's energy play: because long-horizon reasoning burns real power

What I learned while sitting with this (my honest take)

Conclusion: mathematical AGI is starting to look less like a slogan

Frequently Asked Questions About DeepMind Aletheia

Posted by Vinod Pandey

You may like these posts

Post a Comment

1 Comments

Most Popular

DeepSeek V4 API Pricing Explained: Pro vs Flash Cost Breakdown (2026)

Anthropic Mythos Cybersecurity: What Project Glass Wing Found in 30 Days

Google Antigravity 2.0 Broke Thousands of Developer Setups Overnight. Here's Everything That Changed.

Recent Post

Did OpenAI Just Silently Upgrade ChatGPT? The GPT-5.4 Mini Theory (March 2026)

OpenAI's "Spud" Model Is Done Training — And Terence Tao Just Proved Why This Time Might Be Different

Claude Max $200/Month vs OpenClaw API Costs: Which Actually Costs Less in 2026?

Footer Menu Widget

Contact form