OpenAI and Google Shocked by the First Ever Open Source AI Agent

For years, the most powerful AI systems lived behind locked doors at OpenAI and Google. You could talk to them through an app, but you never saw the code, the weights, or the tools they used to get work done.

That story just changed.

In late 2025, a new wave of open source AI agents arrived, led by Zhipu AI’s GLM 4.6V and local frameworks like Goose. These aren’t just chatbots that reply with text. They are smart digital helpers that can see images and video, read huge documents, plan multi-step tasks, and act across your apps and files.

The wild part: you can download many of these tools, run them on your own hardware, and build your own agents. No big research lab required.

Let’s break down what that actually means for you.

What Is an AI Agent and Why Should You Care?

An AI agent is like a supercharged digital assistant. It does not just answer questions in a chat box. It can:

Look at images, charts, and interfaces
Read long PDFs, emails, and web pages
Decide what to do next
Call tools, scripts, and APIs to act on your behalf

Think of it as a smart co-worker that lives inside your computer.

A simple language model (LLM) can answer a question like:
“Explain this paragraph in plain English.”

An AI agent can handle a full workflow, for example:

Read a 40-page PDF report.
Pull key numbers from charts.
Update a spreadsheet with those numbers.
Draft a summary email to your team.

Modern agents like GLM 4.6V and frameworks like Goose go even further. They work across text, code, images, and even video. They also keep track of context across many steps, not just one reply at a time.

AI agent vs normal AI chatbot: what is the real difference?

A normal AI chatbot is like a smart conversation partner. You ask something, it answers, then it waits for your next question.

An AI agent is more like a junior assistant. It can:

Open files
Search the web
Call APIs
Run scripts
Change things in your apps

Here is a simple comparison in words:

Chatbot: You paste a paragraph from an email and ask, “How should I reply?” It gives you a draft.
Agent: It reads your full inbox, spots emails that need replies, drafts answers, files receipts into folders, and updates your task manager.

Another example:

Chatbot: You paste a chart and ask what it means.
Agent: It reads the chart, opens your project spreadsheet, updates the forecast tab, and leaves comments with its reasoning.

Agents don’t just respond to single messages. They run workflows with many steps and tools.

Why AI agents matter for students, creators, and small businesses

You don’t have to be a developer to benefit from this shift.

Students can use an AI agent to:

Read several research papers in one go
Extract key claims, methods, and charts
Build a study guide, flashcards, or a comparison table

Creators can:

Feed in a long video recording
Let the agent cut it into chapters
Get a script outline, blog post draft, and social captions from the same source

Small businesses can:

Load four competing vendors’ financial reports at once
Ask the agent to compare pricing, terms, and risk points
Get a clean table and a short recommendation for the leadership team

Work that once needed a full team of interns or analysts can now sit on one person’s laptop. That is why open source agents are such a big deal.

Meet GLM 4.6V: The Open Source AI Agent That Shocked Big Tech

GLM 4.6V, built by Zhipu AI, is the model that made a lot of people at OpenAI and Google sit up straight.

Here is why:

It is open source and MIT licensed. You can use it, modify it, and host it almost anywhere.
It is multimodal. It reads text, images, charts, UI screenshots, and video.
It has a 128k context window, enough for roughly 150 pages of mixed text and visuals or about an hour of video.
It has native tool calling, including tools that take images and other visuals directly as inputs.

Unlike older vision models that mainly “describe what they see,” GLM 4.6V uses what it sees to decide what to do next. It can be the backbone of an AI agent that observes, plans, and acts.

The large 106B-parameter version targets cloud and high-performance setups, while the 9B “Flash” version is tuned for local use and is free to use. Pricing for the big version is much lower than many closed multimodal models, even though it competes with or beats them on hard benchmarks.

You can see how the broader GLM family is positioned on the GLM series page on Hugging Face, which gives a sense of how quickly these open models are improving.

How GLM 4.6V turned images, video, and web pages into real AI actions

Most older “vision + language” models did a kind of visual telephone game. You sent an image, the model converted it into a long text description, then passed that along to a tool.

That process was:

Slow
Easy to lose details
Hard to scale to many images or frames

GLM 4.6V takes a different path. It treats images and video frames as first-class inputs for tools.

Imagine this workflow:

You send the model 200 slide images from a pitch deck.
The model picks out the most important charts.
It calls a reporting tool, passing those chart images directly as parameters.
The tool returns visual outputs, like a grid of comparison charts.
The model looks at those images, reasons over them, and writes a final report.

No long text descriptions in the middle. The visuals stay visual the whole time.

Or picture a product comparison page: the model can scan the screenshot, detect prices and key specs, send that image into a table-building tool, and bring back a neat comparison table for you.

This is what makes GLM 4.6V feel like a true agent backbone instead of a fancy caption generator.

AI model analyzing charts on multiple screens

Why the long context window changes what AI agents can do

Context size sounds abstract, so let’s put it in human terms.

A 128k token window is like giving your AI agent a massive whiteboard and bookshelf for each task. It can “remember” roughly:

Around 150 pages of text
Dozens of high-resolution images
Or about an hour of video transcripts plus key frames

All in one go.

That means you can:

Compare four full company reports at once, not one at a time
Feed in a long research paper plus all its figures and supplements
Drop in a full match or lecture recording and ask for chapter notes, key moments, and follow-up questions

Because GLM 4.6V’s vision and language parts were trained to work together on long, mixed inputs, it keeps track of images, tables, and text across that whole span. So your agent does not lose track of earlier pages or slides.

For real workflows, this means fewer hacks, fewer manual splits, and far more “just give it everything and ask for what you want.”

Open source, low cost, and local: the real threat to OpenAI and Google

On paper, OpenAI and Google still run some of the biggest, most polished AI systems. What unsettles them is the combination GLM 4.6V brings:

MIT license, so companies can use and modify it without tight legal limits
A free 9B “Flash” version that runs on consumer GPUs or strong laptops
A 106B version that is far cheaper per million tokens than many closed rivals
Benchmark scores that rival or beat larger models on math, web navigation, and multimodal tests

Startups and enterprises can now ask a simple question:
“Why lock into one vendor if we can host a state-of-the-art multimodal AI ourselves?”

When you can run an AI agent locally, combine it with your private data, and keep full control of security and cost, the grip of closed labs starts to weaken.

My Personal Experience: How an Open Source AI Agent 10x’d My Workflow

The hype feels very different when an AI agent starts touching your real work.

I set up a workflow that used GLM 4.6V as the main model, plus an agent framework similar to Goose on my machine. My goal was simple: turn long, messy research into scripts and articles much faster, without burning out.

Here is what changed.

Developer working with AI tools on a laptop

What changed when I let an AI agent handle my research and content

Before, a full research-heavy article could take me 4 to 6 hours:

Skimming 6 to 10 papers
Copying key charts
Taking notes
Drafting a structure
Writing and editing

With the agent in place, my workflow flipped:

I dropped PDFs, slide decks, and a few web links into a folder.
The agent “ingested” them, reading text, charts, and tables together.
It generated:
- A bullet summary of each source
- A comparison of the main claims
- A suggested outline for an article or video script

When GLM 4.6V saw charts or tables inside those PDFs, it did not just describe them. It used their values in the outline and suggested where visuals might support the story.

This felt like running my own “2026 AI playbook” in real time: instead of just reading about new tools, I wired them straight into my workflow.

There were rough edges. Once, the agent misread a chart axis and flipped two numbers, which would have made a big claim wrong. Luckily, I still cross-check important stats, so I caught it. After that, I added a simple step: “Show me the raw values you extracted before summarizing.” That fixed most of those issues.

Another time, it tried to write code snippets for a charting library I do not use. I tweaked the tool config to prefer my usual stack, and the next run matched my setup.

Real results: time saved, output gained, and what still needs a human

When everything clicked, the numbers were hard to ignore:

A 4-hour research + drafting session shrank to about 30 to 40 minutes
I could ship 2 or 3 solid pieces in the time I used to write one
I felt less drained, because the agent handled the grindy parts: sorting, comparing, and first drafts

That speed opened space for better work:

More time refining arguments and structure
More room for original takes and experiments
Less stress when deadlines piled up

At the same time, the agent did not replace me.

I still handle:

Final judgment on which sources to trust
Tone, voice, and story choices
Ethical calls around what to include, what to skip, and how to frame things

In other words, the AI agent became a force multiplier, not a ghostwriter. It gave me the equivalent of a fast, tireless research assistant, while I stayed in charge of taste and responsibility.

Person reviewing AI-generated content on a tablet

Other Open Source AI Agent Projects You Should Know

GLM 4.6V is the “brain” side of this story, but it is not alone. There is a growing ecosystem around open source AI agents.

Two projects stand out:

Goose, by Block, which focuses on running practical agents on your local machine
The Agentic AI Foundation, which helps keep standards like MCP open and neutral

Together, they show that agents are not just about big models. They are also about shared rules and tools that anyone can plug into.

Goose: the practical open source AI agent that lives on your computer

Goose, created by Block (the company behind Square and Cash App), is an open source AI agent framework you can run locally. According to its GitHub project page, Goose is:

Local-first and open source, so you stay in control of data
Able to work with many language models, both commercial and open
Built to plug into tools through the Model Context Protocol (MCP)

In simple terms, Goose is the “hands” and “nervous system” for your AI agents. It connects models to:

Code editors
Terminals
GitHub
Cloud APIs
File systems

MCP is the common language that lets AI talk to thousands of tools. The Model Context Protocol site describes it as a shared standard that lets agents discover tools, ask for data, and run actions in a safe, controlled way.

If you are a developer, this is the point where it gets very real. You can:

Let an agent debug code, run tests, and suggest fixes
Ask it to refactor a project and open pull requests
Have it manage routine tasks like updating docs and release notes

Instead of copying and pasting between windows, the agent works directly with your tools.

Agentic AI Foundation: why a shared standard matters for everyone

The Agentic AI Foundation (AAIF), launched under the Linux Foundation, takes a bigger-picture view. It exists to keep standards for AI agents open, secure, and fair.

A good way to think about AAIF is to compare it to USB for hardware or HTTP for the web. Those standards made it possible to plug almost any device into almost any computer, or load almost any site in almost any browser.

AAIF does something similar for agentic AI:

It steers open governance for MCP and related projects
It helps prevent one company from owning the “socket” that all agents must use
It supports secure, scalable infrastructure so enterprises can trust these systems

You can see how the Linux Foundation positions this effort in its announcement of the Agentic AI Foundation, which highlights how shared standards help avoid vendor lock-in.

For regular users, the impact is simple: agents built with GLM 4.6V, Goose, or other tools are more likely to “just work together” instead of living in separate, closed silos.

Conclusion: The Age of Open AI Agents Has Started

For years, it felt like only OpenAI and Google could run true AI agents. Now, open source projects like GLM 4.6V and Goose have pushed that power into the hands of anyone with a decent GPU and some curiosity.

We now have:

Models that can see, read, plan, and act across tools
Frameworks that run locally and connect to thousands of services
Shared standards that keep the agent layer open instead of locked down

The big question is no longer “Will AI agents arrive?” but “What will you build with them?”

If you start treating AI as a partner that can handle full workflows, not just single prompts, you can unlock a very real multipler effect on your time and output.

Thanks for reading. How will you use open AI agents in your own work this year?

OpenAI and Google Shocked by the First Ever Open Source AI Agent

What Is an AI Agent and Why Should You Care?

AI agent vs normal AI chatbot: what is the real difference?

Why AI agents matter for students, creators, and small businesses

Meet GLM 4.6V: The Open Source AI Agent That Shocked Big Tech

How GLM 4.6V turned images, video, and web pages into real AI actions

Why the long context window changes what AI agents can do

Open source, low cost, and local: the real threat to OpenAI and Google

My Personal Experience: How an Open Source AI Agent 10x’d My Workflow

What changed when I let an AI agent handle my research and content

Real results: time saved, output gained, and what still needs a human

Other Open Source AI Agent Projects You Should Know

Goose: the practical open source AI agent that lives on your computer

Agentic AI Foundation: why a shared standard matters for everyone

Conclusion: The Age of Open AI Agents Has Started

Posted by Vinod Pandey

Post a Comment

0 Comments

Most Popular

OpenAI Employees Are Defending a Rival Company Against the US Government — That's Never Happened Before

Is Claude Getting Banned? What the Anthropic-Pentagon Fight Actually Means for You (March 2026)

GPT-5.4 vs Grok 4.20 Beta: Which AI Is Actually Better in March 2026?

Recent Post

AGI Just Became Real? Inside Integral AI’s “First AGI-Capable” Model

GPT 5.2 Backlash: Why The Smartest AI Yet Still Feels Wrong