The “Brain-Like” AI That Beat ChatGPT on Reasoning Tasks: Breakthrough or Just Training Tricks?

Alexander Thorpe

31 Aug 2025 — 2 min read

Every few months, headlines scream that a new artificial intelligence model has “outperformed ChatGPT” or “thinks like the human brain.” The latest buzz is around the Hierarchical Reasoning Model (HRM) — a small AI system that supposedly beat today’s largest language models at tricky reasoning puzzles.

So, is this really the start of a new AI revolution, or just another case of inflated research hype? Let’s dig in.

What is HRM, in simple terms?

Most AI models like ChatGPT work step-by-step, whispering through problems in a single chain of thought. HRM takes a different approach: it has two parts working together — one part that looks at the big picture, and another that focuses on details.

The surprising thing? HRM is tiny — just 27 million parameters (compare that to GPT-4’s estimated trillions) and trained on only 1,000 puzzles. Yet it still outperformed much larger models on the ARC-AGI reasoning benchmark, a notoriously tough test designed to measure problem-solving and abstract reasoning.

On paper, this looks like David beating Goliath.

But here’s the twist

When independent researchers and the ARC-AGI benchmark team looked closer, they found that the architecture itself (the “brain-like” design) wasn’t the real reason HRM did so well.

Instead, most of the gains came from a hidden training trick called outer-loop refinement.

Think of it as trial and error with memory:

Generate candidate solutions (like multiple guesses to a puzzle).
Evaluate those guesses against the rules.
Keep the best, discard the rest.
Resample and refine using what worked.
Repeat until the solution emerges.

This loop lets the model correct itself over multiple tries, rather than being stuck with its first answer.

Here’s a simple example.

Example puzzle

Input grid:

. R .
. R .
. . .

Rule: Extend the red column downward.

Correct solution:

. R .
. R .
. R .

How HRM’s loop works:

First tries: fill all spots red (wrong), copy top row (wrong), extend vertical pattern (right).
Keeps the “extend pattern” guess, refines it, and applies the logic again next round.

Over time, it learns that “continuing patterns” is a good rule for many puzzles.

Why this matters

Outer-loop refinement is powerful because it turns reasoning into an iterative search process. Instead of guessing once, the model gets to try, check, and improve. That’s a huge advantage on puzzles like ARC-AGI.

But here’s the key:

The hierarchical brain-like structure only helped a little.
The training procedure (outer-loop refinement + puzzle-specific augmentation) did the heavy lifting.

In fact, without the refinement loop, HRM’s performance dropped sharply.

So, breakthrough or hype?

Promising: HRM shows that smaller, specialized models can sometimes outperform giants like GPT-4 — especially on narrow tasks. That’s exciting, because it suggests we don’t always need trillion-parameter behemoths to make progress.
But not a brain revolution: The headlines about “AI modeled on the human brain” are misleading. What really boosted performance was a clever training loop and tailored practice, not a fundamentally new type of reasoning.

It’s less “we built a brain” and more “we gave the model multiple tries and smart feedback.”

The takeaway

HRM is an interesting research direction, and outer-loop refinement could inspire new ways of training reasoning systems. But it’s not a magic human-like brain — yet.

The real lesson here: sometimes the secret to better AI isn’t bigger models or radical architectures — it’s how you train and refine them.