The “Brain-Like” AI That Beat ChatGPT on Reasoning Tasks: Breakthrough or Just Training Tricks?
Every few months, headlines scream that a new artificial intelligence model has “outperformed ChatGPT” or “thinks like the human brain.” The latest buzz is around the Hierarchical Reasoning Model (HRM) — a small AI system that supposedly beat today’s largest language models at tricky reasoning puzzles.
So, is this really the start of a new AI revolution, or just another case of inflated research hype? Let’s dig in.
What is HRM, in simple terms?
Most AI models like ChatGPT work step-by-step, whispering through problems in a single chain of thought. HRM takes a different approach: it has two parts working together — one part that looks at the big picture, and another that focuses on details.
The surprising thing? HRM is tiny — just 27 million parameters (compare that to GPT-4’s estimated trillions) and trained on only 1,000 puzzles. Yet it still outperformed much larger models on the ARC-AGI reasoning benchmark, a notoriously tough test designed to measure problem-solving and abstract reasoning.
On paper, this looks like David beating Goliath.
But here’s the twist
When independent researchers and the ARC-AGI benchmark team looked closer, they found that the architecture itself (the “brain-like” design) wasn’t the real reason HRM did so well.
Instead, most of the gains came from a hidden training trick called outer-loop refinement.
What is outer-loop refinement?
Think of it as trial and error with memory:
- Generate candidate solutions (like multiple guesses to a puzzle).
- Evaluate those guesses against the rules.
- Keep the best, discard the rest.
- Resample and refine using what worked.
- Repeat until the solution emerges.
This loop lets the model correct itself over multiple tries, rather than being stuck with its first answer.
Here’s a simple example.
Example puzzle
Input grid:
. R .
. R .
. . .
Rule: Extend the red column downward.
Correct solution:
. R .
. R .
. R .
How HRM’s loop works:
- First tries: fill all spots red (wrong), copy top row (wrong), extend vertical pattern (right).
- Keeps the “extend pattern” guess, refines it, and applies the logic again next round.
Over time, it learns that “continuing patterns” is a good rule for many puzzles.
Why this matters
Outer-loop refinement is powerful because it turns reasoning into an iterative search process. Instead of guessing once, the model gets to try, check, and improve. That’s a huge advantage on puzzles like ARC-AGI.
But here’s the key:
- The hierarchical brain-like structure only helped a little.
- The training procedure (outer-loop refinement + puzzle-specific augmentation) did the heavy lifting.
In fact, without the refinement loop, HRM’s performance dropped sharply.
So, breakthrough or hype?
- Promising: HRM shows that smaller, specialized models can sometimes outperform giants like GPT-4 — especially on narrow tasks. That’s exciting, because it suggests we don’t always need trillion-parameter behemoths to make progress.
- But not a brain revolution: The headlines about “AI modeled on the human brain” are misleading. What really boosted performance was a clever training loop and tailored practice, not a fundamentally new type of reasoning.
It’s less “we built a brain” and more “we gave the model multiple tries and smart feedback.”
The takeaway
HRM is an interesting research direction, and outer-loop refinement could inspire new ways of training reasoning systems. But it’s not a magic human-like brain — yet.
The real lesson here: sometimes the secret to better AI isn’t bigger models or radical architectures — it’s how you train and refine them.