They Tested AI vs 100,000 Humans on Creativity, and the Real Result Isn’t What the Headlines Say

AI vs 100,000 Humans on Creativity


If you only read the loudest posts online, you’d think the story is simple: ai beat humans at creativity. End of debate, pack it up.

But the actual study is both more interesting and a little unsettling. Yes, modern models can outperform the average person on certain creativity measures. No, that doesn’t mean machines are “creative like humans” in the way most of us mean it.

The short version is this: ai is getting very good at generating lots of different ideas that are far apart from each other. What it still struggles with is taste, judgment, and choosing what matters. That last part is where the best human work hides.

What they tested, and what “creativity” meant in this study

In January 2026, researchers from the University of Montreal published a large comparison in Scientific Reports that put today’s generative ai models up against a massive human dataset of more than 100,000 people.

That scale matters. With smaller studies, it’s easy to shrug and say, “Maybe they sampled weirdly.” With a dataset this big, patterns start to feel stubborn. The University of Montreal’s own write-up frames it clearly: some ai systems can beat average human performance on specific creativity tasks, while the most imaginative humans still stand apart (you can see the university summary in their explainer on the study).

Also, “creativity” here wasn’t judged by vibes. The researchers focused on divergent creativity, which is basically the ability to explore widely, to generate ideas that don’t all sit in the same neighborhood.

A key scoring concept in this study is semantic distance. If two words live far apart in meaning and usage, they’re “more distant.” The higher the distance across your set of ideas, the more divergent your output looks on the score.

The Divergent Association Task (DAT), the simple word test that made headlines

The headliner test was the Divergent Association Task (DAT). It sounds almost too simple: you’re asked to type a small list of words that are as unrelated to each other as possible.

Not opposites. Not categories. Just… far apart.

A rough example (not an official scoring demo) might look like:

Start prompt: “Give 10 unrelated words.”

Possible human-style attempt: “cactus, violin, tax, thunder, perfume, orbit, sponge, divorce, glacier, omelet.”

The idea is that these words don’t cluster naturally. They spread out. And the scoring looks at how far apart they are in semantic space.

This is why the DAT makes headlines. It captures something real about the early phase of creativity: loose exploration. The brainstorming stage where you’re trying to escape the obvious.

The harder writing tests, where ai had more trouble

The study didn’t stop at isolated words. The team also tested writing tasks like short poems (including haiku-style constraints), compact movie summaries, and short fiction.

This is where things get harder for models, fast.

Writing isn’t just “be surprising.” Good writing needs a point. It needs coherence, selective restraint, and a sense of what to leave out. If the DAT measures wide exploration, writing tests also pressure the other side of creativity: shaping, committing, and making meaning.

For a quick mainstream summary of the study’s design and outcomes, ScienceDaily’s coverage is a useful starting point.

The shocking results: ai can beat the average, but hits a ceiling fast

Here’s the part that makes people do a double take: on some divergent creativity measures, ai systems can outperform the average human response.

That doesn’t mean ai has feelings, life experience, or artistic intent. It means one building block of creativity, wide idea generation, is something these models can do extremely well.

But when researchers stopped treating “humans” as one blob and looked at the distribution, the story flipped.

The more you focus on highly creative humans, the more ai falls behind.

Based on the reporting around the paper, the top half of human participants scored higher on average than any model tested. Narrow to the top quarter, and the gap grows. Look at the top 10 percent, and it’s not subtle anymore. The best human performers consistently produced outputs that models couldn’t match, especially once the tasks moved into writing where intention matters.

So what happened? The middle got squeezed.

A useful way to say it: ai is raising the baseline. If your job depends on “pretty good” creative output, the bar just moved. At the same time, truly excellent human creativity becomes more visible, because it’s no longer being compared to the average.

That’s the uncomfortable twist. It’s not “ai is creative now.” It’s “average creative work is no longer rare.”

Why smaller models can sometimes look more creative than bigger ones

One of the most surprising takeaways is that raw model size didn’t perfectly predict creative scores. In some cases, smaller models performed better than larger ones.

That sounds backward until you remember how these systems are shaped. Training choices, alignment tuning, and decoding settings can all change how diverse a model’s outputs are. Sometimes “better behaved” also means “more predictable,” and predictable can look less creative on these metrics.

There’s also a real trade-off hiding in plain sight: improvements aimed at safety, cost, and reliability can reduce output diversity. If a model is pushed toward safer, more standard responses, it may repeat patterns more often. Humans are messy and inefficient, and oddly, that inefficiency helps originality.

Exploration is cheap now, but judgment is still rare

This is the line I can’t stop thinking about: ai is great at exploration, but it doesn’t “know” what matters.

Models can sample widely across language space and combine distant concepts without getting stuck in the same personal habits. Many humans unconsciously circle around familiar topics, memories, and identity. Machines don’t have that personal gravity.

But humans have something models don’t: judgment. We feel when an idea lands. We sense emotional weight. We can connect an idea to culture, timing, and purpose, then commit to it.

ai can generate novelty, distance, and variation. It doesn’t understand significance. That missing piece shows up more clearly in writing tasks, where being selective is part of the craft.

What this means for everyday creators, students, and teams using ai in 2026

In 2026, ai is best seen as a brainstorming partner and a variation engine, not a full replacement for skilled creators. It can help you escape the blank page and explore angles you wouldn’t have considered at 9:30 p.m. on a Tuesday.

The risk is different. The risk is that you confuse more output with better output, and your work starts to feel smooth but empty, like it was sanded down.

If you’re building content, writing scripts, designing lessons, or pitching campaigns, this study is basically a warning label: you still need a human editor brain. If you don’t bring taste, you’ll get generic “fine.”

If you want context on how fast everyday ai tools are changing and getting more independent, this internal piece connects the dots well: Future of Independent AI Agents in 2026.

Use ai for the “wide search”, then switch to human taste for the final choices

A simple workflow that’s been working for me (and it’s not fancy) looks like this:

First, ask for volume: 10 to 20 angles, metaphors, hooks, or outlines. No “best,” because “best” makes the model average itself out.

Then pick 2 to 3 worth keeping. Not the most impressive sounding ones, the ones that match your goal and audience.

After that, rewrite with a point. Add specifics from real life: a detail, a mistake you made, a moment you noticed something.

Finally, cut. Cut again. This is where voice shows up.

ai outputs can sound polished while still saying nothing. You only notice that when you force the piece to take a stance.

One setting and one prompt can change everything, and that is the catch

Another core insight from the study: ai creativity is highly adjustable.

A single parameter (often called “temperature” in many systems) changes how safe or adventurous the output becomes. Low settings tend to repeat high-probability phrases. Higher settings tend to explore more unusual combinations, which can score as more creative on measures like semantic distance.

But there’s a catch. Push too far and coherence breaks. The writing gets weird, scattered, and kind of tiring.

Prompting also matters a lot. Small instruction changes can meaningfully shift results, including prompts that push models to think about deeper language structure (like word origins) instead of surface meaning. That’s a big clue that ai creativity is not self-directed. It’s steered by human intent.

For another public summary of the findings and claims, EurekAlert’s release mirrors the same core conclusion: average can be beaten, top humans still lead.

What I learned while reading this, and how it changed how I use ai

I’ll be honest, my first reaction was a little defensive. Like, “Sure, but it can’t really write.” Then I caught myself. That’s not the point.

The point is that I’ve been using ai wrong when I’m tired. I ask for “the best intro,” “the best headline,” “the best conclusion.” And the model gives me the safe middle, because that’s what “best” usually means statistically.

Now I do something slightly different. I ask for 10 different angles and I tell it to make them genuinely different. Then I pause, read them like a picky editor, and choose one that makes me feel something, even if it’s imperfect.

The other change is boring but real: I spend more time cutting. I don’t let the draft stay long just because ai can keep talking. If a paragraph doesn’t earn its place, it goes. That’s where my voice comes back.

Also, side note, if you’re creating visuals to support writing, having the right tool matters more than people admit. This guide is handy for picking the right generator depending on your goal: Top AI Image Generators to Try in 2026.

Conclusion

The “shocking” headline is true in a narrow way: ai can match or beat average performance on certain divergent creativity tests. The real story is the ceiling. When creativity requires meaning, restraint, and judgment, top humans still lead.

Use ai to explore widely, then slow down and practice taste. That mix is where the good stuff happens.

Where has ai helped you more so far, brainstorming or final writing?

Post a Comment

0 Comments