A warehouse robot cannot fold laundry. A surgical robot cannot weld a seam. A cooking robot cannot open its own front door. Every robot working today, however advanced, is built and trained for exactly one job. Change the task, and you mostly start over.
Physical Intelligence, a startup building what it calls a foundation model for physical action, is betting that this is about to change. The company describes its goal as the "GPT-2 moment for robots": one model that can drive different robot bodies, different arms, different hands, different ways of moving, without being retrained from scratch for each one.
Why One Robot Can't Do Two Jobs
Right now, building a robot looks a lot like building software before reusable code libraries existed. Every robot ships with its own stack. A perception system to see the world. A planning system to decide what to do. A motor control system to carry it out. Each piece is built and tuned for that specific robot and that specific task.
That's why a robot that sorts packages in a warehouse usually can't also fold a shirt, even though both tasks involve picking something up and moving it somewhere else. The underlying skill, looking at an object, understanding what it is, and figuring out how to grasp it, is similar in both cases. The software just doesn't transfer.
Inside Physical Intelligence's pi0 and pi0.5
Physical Intelligence's approach is called a Vision-Language-Action model, or VLA. The idea is to collapse perception, planning, and motor control into a single neural network that goes directly from camera input and a text instruction to motor commands. No separate systems handing information to each other. One network, start to finish.
The company's first model, pi0, was trained on data collected from multiple different robots performing a wide range of tasks. A newer version, pi0.5, pushes further. According to Physical Intelligence, pi0.5 can generalize to entirely new environments it never saw during training, including cleaning up a kitchen or bedroom in a home the model has never been inside.
The company is upfront that this isn't a finished product. By its own account, pi0.5 doesn't succeed on every attempt, and its current focus is on handling new settings rather than mastering new skills or fine dexterity.
Unitree's Volume Bet, Figure AI's Autonomy Bet, and Where PI Fits
Most of the attention in robotics right now goes to two very different strategies. Unitree, the Chinese robotics company, shipped over 5,500 humanoid robots in 2025, with its G1 model priced from roughly $13,000 to $16,000 depending on configuration. The strategy is volume and falling prices, the same playbook that turned solar panels and batteries into commodities.
Figure AI, valued at around $39 billion, has gone the other way. It has shipped far fewer units, but its robots can run autonomously for over 60 hours and complete tasks like sorting and cleaning without a human directing every move, using its own end-to-end neural network called Helix, a different approach from the other humanoid research platforms chasing the same goal.
Physical Intelligence sits outside this argument. It doesn't build robot hardware at all. Its bet is that whoever wins on volume and whoever wins on autonomy will both eventually need a model that runs on whatever hardware they already have, and that model could come from a company that builds neither the cheap robots nor the expensive ones.
The Gap Between This Idea and a Working Robot
The distance between this idea and a finished product is still wide. Foundation models for language took years and enormous datasets before they became reliable for everyday use, and physical data is far harder to collect than text scraped from the internet. Every robot demonstration in a lab is one data point. Getting a single model to handle the near-infinite variety of real homes, warehouses, and factories is a different scale of problem entirely.
NVIDIA's role is worth noting here too. The same Cosmos and Isaac simulation platforms that NVIDIA is building out with partners like LG for humanoid robots are also available to labs and startups working on generalist models. The training infrastructure is becoming a shared resource rather than any single company's edge, which could speed up how fast this approach matures, or make it harder for anyone to stay ahead for long.
My Take
The "GPT-2 moment" comparison is doing a lot of work, and it's worth being skeptical of it. GPT-2 was a language model evaluated on text, a domain where data is cheap and mistakes cost nothing. A robot that misjudges how to grasp a glass breaks the glass. Being wrong here is physical, not just a bad output on a screen.
The "GPT-2 moment" label feels premature.
That said, the framing of the problem, one model instead of one model per robot per task, is correct, and that kind of structural shift tends to compound once it starts working even partially. If pi0.5 can already handle homes it was never trained on, even imperfectly, that's further along than most people outside robotics realize.
- Physical Intelligence's pi0 and pi0.5 are Vision-Language-Action (VLA) models that combine perception, planning, and motor control into one neural network.
- pi0.5 has shown it can adapt to new homes and environments it was never trained on, though not on every attempt.
- Unitree (5,500+ robots shipped in 2025, $13K-16K price range) and Figure AI ($39B valuation, 60+ hour autonomy) represent two opposite hardware strategies.
- Physical Intelligence doesn't build hardware. Its bet is on a model layer that could run on either company's robots, or anyone else's.
- NVIDIA's Cosmos and Isaac platforms supply training infrastructure to generalist-model labs across the board, not just one side of the race.
FAQ
What is a Vision-Language-Action (VLA) model?
It's a type of AI model that takes camera images and a text instruction as input and outputs motor commands directly, instead of relying on separate systems for seeing, planning, and moving.
What is Physical Intelligence's pi0 model?
pi0 is Physical Intelligence's first generalist robot policy, trained on data from multiple different robots performing many tasks. pi0.5 is a newer version built to generalize better to environments it has never seen.
Can pi0.5 work on any robot without retraining?
Not on any robot without limits, but it has shown it can transfer to new tasks and new environments with less retraining than older approaches. The company itself says results aren't consistent yet.
How does Physical Intelligence's approach compare to Unitree and Figure AI?
Unitree and Figure AI build robot hardware with their own software baked in. Physical Intelligence builds only the AI model, designed to potentially run across different hardware from different manufacturers.
Conclusion
Whether Physical Intelligence's approach becomes the standard, or stays one of several competing methods, is still an open question. What seems clear is that the next phase of the robot race may not be decided only by who ships the most units or who builds the most expensive hardware. It could come down to who builds the model that ends up running on top of all of it.
Source: Physical Intelligence, pi0.5 blog post; additional context from a YouTube analysis on the humanoid robot race.
0 Comments