For years, the most powerful AI systems lived behind locked doors at OpenAI and Google. You could talk to them through an app, but you never saw the code, the weights, or the tools they used to get work done.
That story just changed.
In late 2025, a new wave of open source AI agents arrived, led by Zhipu AI’s GLM 4.6V and local frameworks like Goose. These aren’t just chatbots that reply with text. They are smart digital helpers that can see images and video, read huge documents, plan multi-step tasks, and act across your apps and files.
The wild part: you can download many of these tools, run them on your own hardware, and build your own agents. No big research lab required.
Let’s break down what that actually means for you.
What Is an AI Agent and Why Should You Care?
An AI agent is like a supercharged digital assistant. It does not just answer questions in a chat box. It can:
- Look at images, charts, and interfaces
- Read long PDFs, emails, and web pages
- Decide what to do next
- Call tools, scripts, and APIs to act on your behalf
Think of it as a smart co-worker that lives inside your computer.
A simple language model (LLM) can answer a question like:
“Explain this paragraph in plain English.”
An AI agent can handle a full workflow, for example:
- Read a 40-page PDF report.
- Pull key numbers from charts.
- Update a spreadsheet with those numbers.
- Draft a summary email to your team.
Modern agents like GLM 4.6V and frameworks like Goose go even further. They work across text, code, images, and even video. They also keep track of context across many steps, not just one reply at a time.
AI agent vs normal AI chatbot: what is the real difference?
A normal AI chatbot is like a smart conversation partner. You ask something, it answers, then it waits for your next question.
An AI agent is more like a junior assistant. It can:
- Open files
- Search the web
- Call APIs
- Run scripts
- Change things in your apps
Here is a simple comparison in words:
- Chatbot: You paste a paragraph from an email and ask, “How should I reply?” It gives you a draft.
- Agent: It reads your full inbox, spots emails that need replies, drafts answers, files receipts into folders, and updates your task manager.
Another example:
- Chatbot: You paste a chart and ask what it means.
- Agent: It reads the chart, opens your project spreadsheet, updates the forecast tab, and leaves comments with its reasoning.
Agents don’t just respond to single messages. They run workflows with many steps and tools.
Why AI agents matter for students, creators, and small businesses
You don’t have to be a developer to benefit from this shift.
Students can use an AI agent to:
- Read several research papers in one go
- Extract key claims, methods, and charts
- Build a study guide, flashcards, or a comparison table
Creators can:
- Feed in a long video recording
- Let the agent cut it into chapters
- Get a script outline, blog post draft, and social captions from the same source
Small businesses can:
- Load four competing vendors’ financial reports at once
- Ask the agent to compare pricing, terms, and risk points
- Get a clean table and a short recommendation for the leadership team
Work that once needed a full team of interns or analysts can now sit on one person’s laptop. That is why open source agents are such a big deal.
Meet GLM 4.6V: The Open Source AI Agent That Shocked Big Tech
GLM 4.6V, built by Zhipu AI, is the model that made a lot of people at OpenAI and Google sit up straight.
Here is why:
- It is open source and MIT licensed. You can use it, modify it, and host it almost anywhere.
- It is multimodal. It reads text, images, charts, UI screenshots, and video.
- It has a 128k context window, enough for roughly 150 pages of mixed text and visuals or about an hour of video.
- It has native tool calling, including tools that take images and other visuals directly as inputs.
Unlike older vision models that mainly “describe what they see,” GLM 4.6V uses what it sees to decide what to do next. It can be the backbone of an AI agent that observes, plans, and acts.
The large 106B-parameter version targets cloud and high-performance setups, while the 9B “Flash” version is tuned for local use and is free to use. Pricing for the big version is much lower than many closed multimodal models, even though it competes with or beats them on hard benchmarks.
You can see how the broader GLM family is positioned on the GLM series page on Hugging Face, which gives a sense of how quickly these open models are improving.
How GLM 4.6V turned images, video, and web pages into real AI actions
Most older “vision + language” models did a kind of visual telephone game. You sent an image, the model converted it into a long text description, then passed that along to a tool.
That process was:
- Slow
- Easy to lose details
- Hard to scale to many images or frames
GLM 4.6V takes a different path. It treats images and video frames as first-class inputs for tools.
Imagine this workflow:
- You send the model 200 slide images from a pitch deck.
- The model picks out the most important charts.
- It calls a reporting tool, passing those chart images directly as parameters.
- The tool returns visual outputs, like a grid of comparison charts.
- The model looks at those images, reasons over them, and writes a final report.
No long text descriptions in the middle. The visuals stay visual the whole time.
Or picture a product comparison page: the model can scan the screenshot, detect prices and key specs, send that image into a table-building tool, and bring back a neat comparison table for you.
This is what makes GLM 4.6V feel like a true agent backbone instead of a fancy caption generator.
Why the long context window changes what AI agents can do
Context size sounds abstract, so let’s put it in human terms.
A 128k token window is like giving your AI agent a massive whiteboard and bookshelf for each task. It can “remember” roughly:
- Around 150 pages of text
- Dozens of high-resolution images
- Or about an hour of video transcripts plus key frames
All in one go.
That means you can:
- Compare four full company reports at once, not one at a time
- Feed in a long research paper plus all its figures and supplements
- Drop in a full match or lecture recording and ask for chapter notes, key moments, and follow-up questions
Because GLM 4.6V’s vision and language parts were trained to work together on long, mixed inputs, it keeps track of images, tables, and text across that whole span. So your agent does not lose track of earlier pages or slides.
For real workflows, this means fewer hacks, fewer manual splits, and far more “just give it everything and ask for what you want.”
Open source, low cost, and local: the real threat to OpenAI and Google
On paper, OpenAI and Google still run some of the biggest, most polished AI systems. What unsettles them is the combination GLM 4.6V brings:
- MIT license, so companies can use and modify it without tight legal limits
- A free 9B “Flash” version that runs on consumer GPUs or strong laptops
- A 106B version that is far cheaper per million tokens than many closed rivals
- Benchmark scores that rival or beat larger models on math, web navigation, and multimodal tests
Startups and enterprises can now ask a simple question:
“Why lock into one vendor if we can host a state-of-the-art multimodal AI ourselves?”
When you can run an AI agent locally, combine it with your private data, and keep full control of security and cost, the grip of closed labs starts to weaken.
My Personal Experience: How an Open Source AI Agent 10x’d My Workflow
The hype feels very different when an AI agent starts touching your real work.
I set up a workflow that used GLM 4.6V as the main model, plus an agent framework similar to Goose on my machine. My goal was simple: turn long, messy research into scripts and articles much faster, without burning out.
Here is what changed.
What changed when I let an AI agent handle my research and content
Before, a full research-heavy article could take me 4 to 6 hours:
- Skimming 6 to 10 papers
- Copying key charts
- Taking notes
- Drafting a structure
- Writing and editing
With the agent in place, my workflow flipped:
- I dropped PDFs, slide decks, and a few web links into a folder.
- The agent “ingested” them, reading text, charts, and tables together.
- It generated:
- A bullet summary of each source
- A comparison of the main claims
- A suggested outline for an article or video script
When GLM 4.6V saw charts or tables inside those PDFs, it did not just describe them. It used their values in the outline and suggested where visuals might support the story.
This felt like running my own “2026 AI playbook” in real time: instead of just reading about new tools, I wired them straight into my workflow.
There were rough edges. Once, the agent misread a chart axis and flipped two numbers, which would have made a big claim wrong. Luckily, I still cross-check important stats, so I caught it. After that, I added a simple step: “Show me the raw values you extracted before summarizing.” That fixed most of those issues.
Another time, it tried to write code snippets for a charting library I do not use. I tweaked the tool config to prefer my usual stack, and the next run matched my setup.
Real results: time saved, output gained, and what still needs a human
When everything clicked, the numbers were hard to ignore:
- A 4-hour research + drafting session shrank to about 30 to 40 minutes
- I could ship 2 or 3 solid pieces in the time I used to write one
- I felt less drained, because the agent handled the grindy parts: sorting, comparing, and first drafts
That speed opened space for better work:
- More time refining arguments and structure
- More room for original takes and experiments
- Less stress when deadlines piled up
At the same time, the agent did not replace me.
I still handle:
- Final judgment on which sources to trust
- Tone, voice, and story choices
- Ethical calls around what to include, what to skip, and how to frame things
In other words, the AI agent became a force multiplier, not a ghostwriter. It gave me the equivalent of a fast, tireless research assistant, while I stayed in charge of taste and responsibility.
Other Open Source AI Agent Projects You Should Know
GLM 4.6V is the “brain” side of this story, but it is not alone. There is a growing ecosystem around open source AI agents.
Two projects stand out:
- Goose, by Block, which focuses on running practical agents on your local machine
- The Agentic AI Foundation, which helps keep standards like MCP open and neutral
Together, they show that agents are not just about big models. They are also about shared rules and tools that anyone can plug into.
Goose: the practical open source AI agent that lives on your computer
Goose, created by Block (the company behind Square and Cash App), is an open source AI agent framework you can run locally. According to its GitHub project page, Goose is:
- Local-first and open source, so you stay in control of data
- Able to work with many language models, both commercial and open
- Built to plug into tools through the Model Context Protocol (MCP)
In simple terms, Goose is the “hands” and “nervous system” for your AI agents. It connects models to:
- Code editors
- Terminals
- GitHub
- Cloud APIs
- File systems
MCP is the common language that lets AI talk to thousands of tools. The Model Context Protocol site describes it as a shared standard that lets agents discover tools, ask for data, and run actions in a safe, controlled way.
If you are a developer, this is the point where it gets very real. You can:
- Let an agent debug code, run tests, and suggest fixes
- Ask it to refactor a project and open pull requests
- Have it manage routine tasks like updating docs and release notes
Instead of copying and pasting between windows, the agent works directly with your tools.
Agentic AI Foundation: why a shared standard matters for everyone
The Agentic AI Foundation (AAIF), launched under the Linux Foundation, takes a bigger-picture view. It exists to keep standards for AI agents open, secure, and fair.
A good way to think about AAIF is to compare it to USB for hardware or HTTP for the web. Those standards made it possible to plug almost any device into almost any computer, or load almost any site in almost any browser.
AAIF does something similar for agentic AI:
- It steers open governance for MCP and related projects
- It helps prevent one company from owning the “socket” that all agents must use
- It supports secure, scalable infrastructure so enterprises can trust these systems
You can see how the Linux Foundation positions this effort in its announcement of the Agentic AI Foundation, which highlights how shared standards help avoid vendor lock-in.
For regular users, the impact is simple: agents built with GLM 4.6V, Goose, or other tools are more likely to “just work together” instead of living in separate, closed silos.
Conclusion: The Age of Open AI Agents Has Started
For years, it felt like only OpenAI and Google could run true AI agents. Now, open source projects like GLM 4.6V and Goose have pushed that power into the hands of anyone with a decent GPU and some curiosity.
We now have:
- Models that can see, read, plan, and act across tools
- Frameworks that run locally and connect to thousands of services
- Shared standards that keep the agent layer open instead of locked down
The big question is no longer “Will AI agents arrive?” but “What will you build with them?”
If you start treating AI as a partner that can handle full workflows, not just single prompts, you can unlock a very real multipler effect on your time and output.
Thanks for reading. How will you use open AI agents in your own work this year?
0 Comments