Anthropic Wants You To Stop Using Their Best Model — Here's Why That's Smart
Anthropic just shipped the Adviser Strategy: pair cheap Haiku or Sonnet as the executor with Opus as the on-call adviser. Haiku+Opus more than doubled solo Haiku's score on Browse Comp. The math is brutal.
Anthropic just shipped a feature that effectively tells you to stop using their best model. It's called the Adviser Strategy, and the math behind it is the most interesting cost-optimization story Anthropic has released this year.
The pitch: pair a cheap model (Haiku or Sonnet) as the executor with the expensive one (Opus) as an on-call adviser. The executor only calls the adviser when it needs adviser-level intelligence. The rest of the time it handles things itself. Same output quality on hard problems, a fraction of the cost overall.
The Browse Comp benchmark with Haiku + Opus as adviser hit 41.2%. Haiku solo: 19.7%. More than 2× the performance from a routing trick.
The cost wedge that makes this worth doing
The Claude model price gap is wider than people internalize:
| Model | Input ($/1M) | Output ($/1M) |
|---|---|---|
| Opus 4.6 | $5 | $25 |
| Sonnet | $3 | $15 |
| Haiku | $1 | $5 |
Opus output is 5× more expensive than Haiku output. For a long-running agent doing a mix of easy and hard work, that's the difference between a sustainable cost structure and a burn rate that kills the project.
Anthropic's own evals on the Adviser Strategy:
- Sonnet + Opus as adviser vs Sonnet alone: +2.7 percentage points on SWE-bench, ~12% cheaper per agentic task
- Haiku + Opus as adviser vs Haiku alone: 41.2% vs 19.7% on Browse Comp — more than double the score
- Haiku+Opus costs more than Haiku alone, obviously, but it's still cheaper than Opus alone and gets you most of the way to Opus-level performance
How it works mechanically
The Adviser Strategy is a feature of the messages API, not Claude Code itself. You add a parameter to the request:
{
"type": "advisor_2026301",
"name": "advisor",
"max_uses": 3
}
max_uses caps how many times the executor can escalate to the adviser per task — your cost lever. The executor reads the request and decides on its own whether the problem warrants the adviser. Easy questions get answered directly with Haiku or Sonnet. Hard questions trigger an Opus call, the adviser responds, and the executor folds the advice back into the user-facing answer.
Nate Herk built a test dashboard to compare Haiku+Opus vs Sonnet+Opus vs Sonnet-solo vs Opus-solo on the same prompts. Three patterns worth noting from his tests:
- The executor doesn't always call the adviser when you'd expect. A 30-day return policy question is borderline — Haiku answered it solo, but Sonnet decided to escalate to Opus on the same question. The model is using its own judgment about what's hard, and the judgment varies by executor.
- The adviser-augmented answers are usually preferred. Even on questions where the cheaper model technically got the right answer, the adviser-augmented version was more aligned with what you'd want to ship to a customer.
- Opus-solo isn't strictly better than Haiku+Opus. In one test, Haiku+Opus gave a more specific answer ("team will reach out within 1-2 business days") than Opus solo ("team will follow up soon"). The architecture matters at least as much as the brain.
The Claude Code equivalent (you already have it)
The Adviser Strategy is API-only — you can't use it directly inside Claude Code. But Claude Code has a quietly-shipped equivalent: /model opus_plan.
When you set this, Claude Code uses Opus 4.6 in plan mode and Sonnet 4.6 in execution mode. The expensive reasoning happens during the up-front planning step where you actually need it; the cheap execution happens during the implementation step where Sonnet is plenty.
Nate's side-by-side test: same prompt, two terminals, one running opus_plan (Opus → Sonnet) and one running pure Opus. The opus_plan output was better by his read — more dynamic scrolling, clearer visualizations — and used significantly less session budget. The strategy wins on both axes.
If you've been wondering why your Claude Code session limit feels tight even on the $200/month plan, the answer is probably you're running pure Opus when you don't need to. Switching to opus_plan as your default makes your session last 2-3× longer.
Where this matters most
The Adviser Strategy is built for long-running agentic workloads where:
- The task is multi-step and only some steps need top-tier reasoning
- You're paying per token, not running under a flat session plan
- Quality on the hard steps actually matters (you'd accept a 12% cost premium for it)
If you're building a customer support agent, a research agent, or a code-review agent — anything that runs many times a day on a mix of easy and hard inputs — the Adviser Strategy is the right default. The 2.7-point SWE-bench gain Sonnet picks up from Opus advising is more than the gap between most model releases of the past two years.
Caveats Nate names explicitly
- Test before you deploy. He ran 10-20 prompts through each configuration — nowhere near enough for production. Run hundreds.
- Quality vs cost is a real trade-off. Adviser-augmented sometimes gives a slightly different answer than executor-solo. Sometimes that difference is improvement; sometimes it's just change. Decide which one you want before you ship.
- The executor's adviser-call judgment isn't perfect. Use
max_usesaggressively to cap downside cost, and test what happens when the executor under-escalates.
What to actually do this week
- In Claude Code: switch your default to
opus_planand watch your session usage. Most people will see 2-3× more work per session. - If you're shipping an agent on the messages API: add
advisor_2026301withmax_uses: 3as the default and re-run your eval suite. The win is likely real but not always — measure. - For the agentic side projects burning the most tokens: audit which steps actually need Opus. The 80/20 rule applies brutally here — usually 1-2 steps in a 5-step pipeline need the smart model. The rest can run cheaper.
- Push back on "just use the best model." That advice is wrong for almost every workload. The Adviser Strategy is Anthropic's own admission that mixed-model routing beats top-model-only on both cost AND outcome.
The Bottom Line
The most important model release of the month isn't a new model. It's the routing layer that lets you stop using the expensive one when you don't have to. Haiku + Opus more than doubled Haiku solo on Browse Comp. Sonnet + Opus beat Sonnet solo on SWE-bench at 12% lower cost. The Adviser Strategy isn't a clever optimization — it's the new default for anyone shipping real workloads. Anthropic just told you their best model is overkill for most tasks. Listen.