Two Ways To Run Claude Code Completely Free (And When That's A Bad Idea)
You can swap the engine inside Claude Code's harness — local Ollama models or OpenRouter free routing — and pay nothing per token. Nate Herk walks both setups. The catch: local Qwen 3.5 9B took 4 minutes to answer one question.
Nate Herk published the most useful Claude Code cost-cutting guide of the quarter: two distinct paths to run Claude Code with $0 in per-token costs. Both work. Both have real trade-offs. And both are 100% inside Anthropic's terms of service — you're using their harness, just swapping out the engine.
Claude Code is the car. Opus 4.6 is the engine. Open the hood and you can put a different engine in.
That's the whole concept. Here's how both paths work, when each one's the right call, and where the catches are.
The two paths
| Path | What you get | Cost | When it makes sense |
|---|---|---|---|
| Local via Ollama | Run open-source models on your own machine | $0 ongoing (after $5 one-time Anthropic credit) | Privacy-sensitive work, fully offline, you have decent hardware |
| OpenRouter free models | Cloud-hosted open models routed through OpenRouter's API | $0 ongoing (after $10 deposit for rate-limit boost) | Better quality than your local machine can handle, internet OK |
Neither is strictly free — you need $5 in Anthropic credits to authenticate the Claude Code CLI in the first place (you won't actually consume them) and ~$10 in OpenRouter credits to bump from 50 free requests/day to 1,000. Call it ~$15 total to set up an infinitely-usable free Claude Code.
How the open-vs-closed gap looks today
The context that makes this worth doing:
- Opus 4.6 / Sonnet 4.6 / GPT 5.5 still lead SWE-bench verified scores by a real margin
- Qwen 3.6, Gemma 4, MiniMax M 2.7 are all closing the gap quarter over quarter
- Sonnet 3.7 (the model everyone freaked out about when it dropped) is no longer in the top 5 — open-source models have already passed it
The takeaway isn't "open models are good enough for everything." It's "open models are good enough for the 70% of your Claude Code work that doesn't need Opus." The cost arbitrage lives in that 70%.
Path 1 — Local via Ollama (5 minutes to set up)
# 1. Download Ollama for your OS from ollama.com
# 2. Pull a model
ollama pull qwen3.5:9b
# 3. Launch Claude Code with your local model
ollama launch claude
# Pick the model from the list
That's it. You're now running Claude Code's harness against a model living on your laptop. All your file reads, code edits, tool calls, and reasoning run locally.
The hard catch: model size vs. your hardware. Nate ran Qwen 3.5 9B (6.6GB) on his machine and asked it one project-context question. Wall time: 4 minutes. The same question against Anthropic's hosted Sonnet returns in 10 seconds.
Smaller open models (Gemma 4 is the new highlight — smallest size + highest Elo on the size/quality chart) are the realistic local target. Don't try to run a 70B model on a MacBook and expect a productive day.
One configuration tip Nate had to discover the hard way: Ollama's default context window for some models is way smaller than the model's advertised limit. Even if Qwen 3.5 says "200K context," Ollama might default to 8K. Run a quick ollama show <model> and create a custom model with explicit context size if you want the full window.
Path 2 — OpenRouter free models (the better path for most people)
This is the one I'd actually recommend for most Claude Code users:
- Sign up at openrouter.ai
- Deposit $10 (bumps rate limit from 50/day → 1,000/day on free models)
- Create an API key in your account
- Edit
.claude/settings.local.jsonin your project with these env vars:
{
"env": {
"ANTHROPIC_BASE_URL": "https://openrouter.ai/api/v1",
"ANTHROPIC_AUTH_TOKEN": "<your OpenRouter key>",
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_MODEL": "qwen/qwen3.6:free",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen/qwen3.6:free"
}
}
Critical: set ALL the model env vars (ANTHROPIC_MODEL, ANTHROPIC_SMALL_FAST_MODEL, etc.) to your chosen free model. Nate's debugging session: he set only the main model, then noticed his account kept getting charged for Haiku. Why? Claude Code defaults to Haiku for small tool calls and file searches; with the small-fast-model unset, those leaked through to the paid path. Setting every model variable plugs every leak.
Once configured, run claude in your project and you'll see "OpenRouter free • API billing usage" at the top instead of your Max plan. Test with a simple prompt. Confirm in your OpenRouter logs that cost is $0.
What's actually free vs. cheap on OpenRouter
OpenRouter has dozens of :free models. The notable ones:
openrouter/auto:free— routes to whichever free model is least rate-limited right now. Convenient but you lose control of which model handles your work.qwen/qwen3.6:free— Qwen 3.6 with 1M-token context, fully free. Probably the best free model on the platform right now.google/gemma-4-31b:free— Gemma 4 31B, also free. Strong reasoning, smaller context.
If you outgrow the free tier or hit rate limits, the next-cheapest paid step is dramatic: Gemma 4 31B paid is $0.14 / 1M input tokens vs Opus 4.6's $5. That's a 35× discount and still uses the Claude Code harness. Even if "free" isn't your goal, "50-100× cheaper than Opus" absolutely is.
Where local + OpenRouter free fail (and Opus still wins)
Nate is honest about this — the trade-offs are real:
- Tool-call visibility. Open models often don't surface the same step-by-step tool reasoning Claude does. You see the result, not the path. Debugging when something goes wrong is harder.
- Web search. Open models on OpenRouter may not have native web-search tooling. You'll need Perplexity or Brave Search as a backup MCP.
- JSON protocol compliance. Claude Code expects a specific tool-calling format. Some open models drift — they'll either skip tools or misformat calls. Quality is workload-dependent.
- Context windows are smaller than Opus's 1M. If your project loads 200K+ tokens of CLAUDE.md + skills + files, you're going to run out fast on most open models.
Which is why the realistic pattern is mixed routing — not all-free, but free-by-default with Opus for the hard 20% of work.
When each path makes sense
Use local Ollama when:
- The work is genuinely sensitive (client code, proprietary research, anything you don't want passing through any third-party server)
- You're offline (planes, remote work, dev hotel WiFi)
- You have enough RAM/GPU to run a serious model (32GB+ for anything useful)
- You're testing or prototyping and quality matters less than speed-of-iteration
Use OpenRouter free models when:
- Anthropic is down (check status.claude.com — happens more than you'd think)
- You've hit your Max session limit and have 2 hours to wait
- You're doing high-volume low-stakes work — summarization, classification, file grep, scaffolding
- You're building automations that hit Claude Code dozens of times a day on routine tasks
Stick with Opus/Sonnet via your Max plan when:
- The work is complex and getting it wrong costs you a re-do (architectural decisions, hard debugging)
- You need full tool-call visibility (you're learning Claude Code or debugging an agent)
- The session is short enough that token cost is irrelevant
What to actually do this week
- Sign up at openrouter.ai and deposit $10. The 1,000-requests/day cap is enough for most weekly Claude Code work.
- Configure one Claude Code project to use Qwen 3.6 free via
.claude/settings.local.json. Run a non-trivial task. See where it's strong and where it stumbles. - Install Ollama and pull Gemma 4 as the local backup. You won't use it daily, but you'll be glad it's there when Anthropic has an outage or you're on a plane.
- Set the
ANTHROPIC_SMALL_FAST_MODELenv var explicitly — this is the single most-missed step and it's the one that quietly leaks paid Haiku calls into your "free" sessions.
The Bottom Line
The Claude Code harness is now decoupled from the Anthropic engine — and that's the most important pricing shift in the agent ecosystem this year. You don't need to leave Claude Code to escape paid-token Hell. Open the hood, swap the engine, route the right work to the right model. Free isn't free (you need hardware or you're rate-limited), but 50-100× cheaper is real, and 50-100× cheaper for the routine 70% of your work pays for itself by Tuesday.