Google Just Changed the Game: Small AI Models That Reason, and an AI That Does Real Science

Google Just Achieved True Intelligence With New AI

In the ever-accelerating race to build smarter, more capable artificial intelligence, Google has quietly delivered not one—but two revolutionary breakthroughs that could reshape how we think about AI development, scientific discovery, and the very future of research.

One innovation teaches small AI models to think like human experts, not just mimic them. The other? An AI “co-scientist” that solved a biological mystery that stumped researchers for over a decade—in just a few days.

Let’s unpack both—and why they matter more than you might realize.

Part 1: Supervised Reinforcement Learning (SRL) — How Google Taught Tiny AIs to Really Think

For years, AI researchers have faced a fundamental trade-off:

Supervised learning gives models the “right answers” during training—but often leads to rote memorization and poor generalization, especially on complex reasoning tasks.
Reinforcement learning (RL) lets models explore and discover solutions on their own—but requires massive compute, carefully designed reward signals, and often fails when even a single step in a long chain is wrong.

Enter Supervised Reinforcement Learning (SRL)—a brilliant hybrid developed by researchers at Google Cloud AI and UCLA that merges the best of both worlds in a way that shouldn’t work… but does.

The Problem: Small Models Fail at Hard Reasoning

Take Qwen-2.5 7B Instruct, a capable open-source language model with just 7 billion parameters. Feed it a tough math problem from the S1-K1.1 benchmark—like those from the AMC or AIME competitions—and it starts hallucinating. Even with perfect “teacher” examples, standard supervised fine-tuning just makes it copy token-by-token, not understand the logic.

Worse? On small datasets (as few as 1,000 examples), performance can actually drop below the base model. The model isn’t learning how to solve—it’s just parroting.

The SRL Fix: Make the AI “Earn” the Answer

SRL flips the script. Instead of giving the model a full solution and saying “learn this,” it:

Breaks expert solutions into step-by-step trajectories (like a math proof or code debugging path).
Forces the model to generate a private reasoning trace inside <think> tags—its “scratch pad.”
At each step, outputs one action (e.g., the next line of code or algebraic manipulation).
Compares that action to the expert’s using string similarity (via Python’s difflib), assigning an immediate reward.

Crucially, every step gets graded—even if the final answer is wrong. This creates dense, continuous feedback, teaching the model which decisions matter in a reasoning chain.

Think of it like giving a student a textbook answer—but making them re-derive every line, earning points for correct logic, not just the final number.

The Results? Stunning.

Testing on Qwen-2.5 7B Instruct with the same S1-K1.1 data:

benchmark	baseline	after srl	after srl+rlvr
AMC 23	50.0	—	57.5
AIME 24	13.3	16.7	20.0
AIME 25	6.7	13.3	10.0

Note: AIME 25’s score “drops” after RLVR because it’s a harder subset—but 10.0 is the highest open-source result to date.

Even more impressive? SRL doubled performance on code reasoning. On SWE-Bench Verified, the SRL-trained model achieved:

14.8% in Oracle File Edit mode (vs. 5.8% baseline)
8.6% end-to-end (vs. 3.2%)

That’s competitive with models 10x its size—and it used only 5,000 verified trajectories (expanded to 134,000 steps).

Why This Matters for Open-Source AI

SRL requires no giant reward models, no H100 clusters, and works with tiny datasets. It’s designed for the real world—where most developers don’t have Google-scale resources.

By reframing reasoning as action selection (not next-token prediction), SRL lets small models:

Avoid overfitting
Self-correct mid-reasoning
Generalize from sparse examples

In short: You no longer need 400B parameters to get deep reasoning—just smarter training.

Part 2: Meet Google’s AI “Co-Scientist” — Solving Real Biological Mysteries

While one Google team was revolutionizing training methods, DeepMind was building something even more ambitious: an AI system that does original scientific research.

This isn’t just AI-assisted science—it’s AI leading the charge.

The System: A Multi-Agent “Lab” Inside Gemini 2.0

Dubbed the AI co-scientist, this system isn’t a single model. It’s a collaborative swarm of specialized AI agents, each mimicking a role in a real research team:

Generation Agent: Brainstorms hypotheses through internal debate.
Reflection Agent: Acts as peer reviewer, critiquing weaknesses.
Ranking Agent: Uses ELO-style tournaments (like chess rankings) to score ideas.
Evolution Agent: Combines top hypotheses or explores radical hybrids.
Meta-Review Agent: Learns from past cycles to improve the whole system.

Humans set the goal and give high-level feedback—but the AI does the heavy intellectual lifting.

Breakthrough #1: New Drugs for Liver Fibrosis (Published in Advanced Science)

The Challenge: Liver fibrosis—a deadly scarring disease—has no effective drugs. Existing lab models poorly mimic human livers, and decades of research yielded little.

The AI’s Task: Find novel epigenomic drug targets (chemicals that regulate genes without altering DNA).

What Happened?
Given one prompt and access to literature, the AI:

Analyzed thousands of papers
Proposed three drug classes: HDAC inhibitors, DNMT1 inhibitors, and BRD4 inhibitors
Even specified exact experiments: “Use single-cell RNA sequencing on human hepatic organoids.”

Validation:
Researchers tested the suggestions on miniature human livers grown from stem cells. Results?

HDAC and BRD4 inhibitors significantly reduced fibrosis
Vorinostat (an FDA-approved cancer drug in the HDAC class) didn’t just stop scarring—it boosted healthy liver tissue growth

The Shock:
Lead researcher Dr. Gary Peltz (Stanford) checked PubMed: over 180,000 papers on liver fibrosis, but only 7 mentioned Vorinostat—and just 2 tested it. The AI found the signal in the noise instantly.

Even more telling? Two human-selected drug targets (with more literature support) failed completely. The AI’s picks outperformed them both.

Next Steps:
Peltz’s team is now testing if Vorinostat can reverse established fibrosis—and in talks with pharma for clinical trials.

Breakthrough #2: Cracking a 10-Year Bacterial Mystery (Published in Cell)

The Puzzle: Tiny genetic elements called CFPICIs (capsid-forming phage-inducible chromosomal islands) mysteriously jump between distant bacterial species—even though their viral “taxis” (phages) usually infect only one host type.

Human researchers at Imperial College London spent 10+ years uncovering the trick: “tail piracy.”
CFPICIs build their own DNA-filled heads—but steal tails from other phages, even across species, to infect new hosts.

The AI’s Test:
Given only pre-discovery data (nothing published after 2022), could it deduce the mechanism?

Result:
The AI generated five hypotheses. #1:

“CFPICIs achieve broad host range through capsid-tail incompatibility bypass via pseudotail interactions.”

That’s scientific jargon for “tail piracy.”
It matched the human discovery word-for-word in concept—in days, not decades.

No other AI system (including competitors) came close. Only Gemini 2.0’s multi-agent architecture pieced it together.

As Dr. Peltz put it:

“AI output still needs human validation—but the speed boost is unreal.”

His lab now uses the co-scientist for genetic discovery and drug repurposing, believing it will soon impact real patient care.

Why These Breakthroughs Change Everything

For AI Developers:

SRL democratizes advanced reasoning. You no longer need trillion-parameter models to solve complex problems—just smarter training on small hardware.
Open-source communities can now build reasoning-competent models with limited data and compute.

For Scientists:

The AI co-scientist accelerates discovery from years to days.
It uncovers non-obvious connections buried in literature—connections humans miss due to cognitive bias or information overload.
It suggests testable experiments, closing the loop between hypothesis and validation.

For Society:

Faster drug discovery → lives saved.
More efficient AI → greener, cheaper, more accessible technology.
A new era of human-AI collaboration, where machines handle data-heavy reasoning, and humans provide vision and ethics.

Also Read: Grok 5 – Elon Musk’s Next AI Breakthrough That Could Redefine AGI

The Big Question: What Happens Next?

If AI can already solve decade-old scientific puzzles, how long before it makes discoveries we can’t even verify—or understand?

Google’s work suggests we’re entering an era where:

Small AIs replace bloated models for specialized tasks.
AI research teams become standard in labs worldwide.
Scientific progress shifts from linear to exponential.

One thing’s clear: the future of intelligence—both artificial and human—is being rewritten, one breakthrough at a time.

Final Thoughts: A New Paradigm for AI and Science

Google’s dual breakthroughs—SRL for efficient reasoning and the AI co-scientist for discovery—aren’t just technical wins. They represent a philosophical shift:

Intelligence isn’t about size. It’s about how you learn—and who you collaborate with.

Whether you’re a developer, researcher, or just curious about AI’s future, one message echoes loud and clear:

The age of thinking machines has truly begun.

Google Just Changed the Game: Small AI Models That Reason, and an AI That Does Real Science