Two Ways To Run Claude Code Completely Free (And When That's A Bad Idea)

You can swap the engine inside Claude Code's harness — local Ollama models or OpenRouter free routing — and pay nothing per token. Nate Herk walks both setups. The catch: local Qwen 3.5 9B took 4 minutes to answer one question.

The VIP Desk

5 min read·May 13, 2026·Summarizing Nate Herk

the-prompt-vip

Nate Herk published the most useful Claude Code cost-cutting guide of the quarter: two distinct paths to run Claude Code with $0 in per-token costs. Both work. Both have real trade-offs. And both are 100% inside Anthropic's terms of service — you're using their harness, just swapping out the engine.

Claude Code is the car. Opus 4.6 is the engine. Open the hood and you can put a different engine in.

That's the whole concept. Here's how both paths work, when each one's the right call, and where the catches are.

The two paths

Path	What you get	Cost	When it makes sense
Local via Ollama	Run open-source models on your own machine	$0 ongoing (after $5 one-time Anthropic credit)	Privacy-sensitive work, fully offline, you have decent hardware
OpenRouter free models	Cloud-hosted open models routed through OpenRouter's API	$0 ongoing (after $10 deposit for rate-limit boost)	Better quality than your local machine can handle, internet OK

Neither is strictly free — you need $5 in Anthropic credits to authenticate the Claude Code CLI in the first place (you won't actually consume them) and ~$10 in OpenRouter credits to bump from 50 free requests/day to 1,000. Call it ~$15 total to set up an infinitely-usable free Claude Code.

How the open-vs-closed gap looks today

The context that makes this worth doing:

Opus 4.6 / Sonnet 4.6 / GPT 5.5 still lead SWE-bench verified scores by a real margin
Qwen 3.6, Gemma 4, MiniMax M 2.7 are all closing the gap quarter over quarter
Sonnet 3.7 (the model everyone freaked out about when it dropped) is no longer in the top 5 — open-source models have already passed it

The takeaway isn't "open models are good enough for everything." It's "open models are good enough for the 70% of your Claude Code work that doesn't need Opus." The cost arbitrage lives in that 70%.

Path 1 — Local via Ollama (5 minutes to set up)

# 1. Download Ollama for your OS from ollama.com
# 2. Pull a model
ollama pull qwen3.5:9b

# 3. Launch Claude Code with your local model
ollama launch claude
# Pick the model from the list

That's it. You're now running Claude Code's harness against a model living on your laptop. All your file reads, code edits, tool calls, and reasoning run locally.

The hard catch: model size vs. your hardware. Nate ran Qwen 3.5 9B (6.6GB) on his machine and asked it one project-context question. Wall time: 4 minutes. The same question against Anthropic's hosted Sonnet returns in 10 seconds.

Smaller open models (Gemma 4 is the new highlight — smallest size + highest Elo on the size/quality chart) are the realistic local target. Don't try to run a 70B model on a MacBook and expect a productive day.

One configuration tip Nate had to discover the hard way: Ollama's default context window for some models is way smaller than the model's advertised limit. Even if Qwen 3.5 says "200K context," Ollama might default to 8K. Run a quick ollama show <model> and create a custom model with explicit context size if you want the full window.

Path 2 — OpenRouter free models (the better path for most people)

This is the one I'd actually recommend for most Claude Code users:

Sign up at openrouter.ai
Deposit $10 (bumps rate limit from 50/day → 1,000/day on free models)
Create an API key in your account
Edit .claude/settings.local.json in your project with these env vars:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api/v1",
    "ANTHROPIC_AUTH_TOKEN": "<your OpenRouter key>",
    "ANTHROPIC_API_KEY": "",
    "ANTHROPIC_MODEL": "qwen/qwen3.6:free",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen/qwen3.6:free"
  }
}

Critical: set ALL the model env vars (ANTHROPIC_MODEL, ANTHROPIC_SMALL_FAST_MODEL, etc.) to your chosen free model. Nate's debugging session: he set only the main model, then noticed his account kept getting charged for Haiku. Why? Claude Code defaults to Haiku for small tool calls and file searches; with the small-fast-model unset, those leaked through to the paid path. Setting every model variable plugs every leak.

Once configured, run claude in your project and you'll see "OpenRouter free • API billing usage" at the top instead of your Max plan. Test with a simple prompt. Confirm in your OpenRouter logs that cost is $0.

What's actually free vs. cheap on OpenRouter

OpenRouter has dozens of :free models. The notable ones:

openrouter/auto:free — routes to whichever free model is least rate-limited right now. Convenient but you lose control of which model handles your work.
qwen/qwen3.6:free — Qwen 3.6 with 1M-token context, fully free. Probably the best free model on the platform right now.
google/gemma-4-31b:free — Gemma 4 31B, also free. Strong reasoning, smaller context.

If you outgrow the free tier or hit rate limits, the next-cheapest paid step is dramatic: Gemma 4 31B paid is $0.14 / 1M input tokens vs Opus 4.6's $5. That's a 35× discount and still uses the Claude Code harness. Even if "free" isn't your goal, "50-100× cheaper than Opus" absolutely is.

Where local + OpenRouter free fail (and Opus still wins)

Nate is honest about this — the trade-offs are real:

Tool-call visibility. Open models often don't surface the same step-by-step tool reasoning Claude does. You see the result, not the path. Debugging when something goes wrong is harder.
Web search. Open models on OpenRouter may not have native web-search tooling. You'll need Perplexity or Brave Search as a backup MCP.
JSON protocol compliance. Claude Code expects a specific tool-calling format. Some open models drift — they'll either skip tools or misformat calls. Quality is workload-dependent.
Context windows are smaller than Opus's 1M. If your project loads 200K+ tokens of CLAUDE.md + skills + files, you're going to run out fast on most open models.

Which is why the realistic pattern is mixed routing — not all-free, but free-by-default with Opus for the hard 20% of work.

When each path makes sense

Use local Ollama when:

The work is genuinely sensitive (client code, proprietary research, anything you don't want passing through any third-party server)
You're offline (planes, remote work, dev hotel WiFi)
You have enough RAM/GPU to run a serious model (32GB+ for anything useful)
You're testing or prototyping and quality matters less than speed-of-iteration

Use OpenRouter free models when:

Anthropic is down (check status.claude.com — happens more than you'd think)
You've hit your Max session limit and have 2 hours to wait
You're doing high-volume low-stakes work — summarization, classification, file grep, scaffolding
You're building automations that hit Claude Code dozens of times a day on routine tasks

Stick with Opus/Sonnet via your Max plan when:

The work is complex and getting it wrong costs you a re-do (architectural decisions, hard debugging)
You need full tool-call visibility (you're learning Claude Code or debugging an agent)
The session is short enough that token cost is irrelevant

What to actually do this week

Sign up at openrouter.ai and deposit $10. The 1,000-requests/day cap is enough for most weekly Claude Code work.
Configure one Claude Code project to use Qwen 3.6 free via .claude/settings.local.json. Run a non-trivial task. See where it's strong and where it stumbles.
Install Ollama and pull Gemma 4 as the local backup. You won't use it daily, but you'll be glad it's there when Anthropic has an outage or you're on a plane.
Set the ANTHROPIC_SMALL_FAST_MODEL env var explicitly — this is the single most-missed step and it's the one that quietly leaks paid Haiku calls into your "free" sessions.

The Bottom Line

The Claude Code harness is now decoupled from the Anthropic engine — and that's the most important pricing shift in the agent ecosystem this year. You don't need to leave Claude Code to escape paid-token Hell. Open the hood, swap the engine, route the right work to the right model. Free isn't free (you need hardware or you're rate-limited), but 50-100× cheaper is real, and 50-100× cheaper for the routine 70% of your work pays for itself by Tuesday.

the-prompt-vipClaude Code freeOllama Claude CodeOpenRouter free modelslocal LLMQwen 3.5Gemma 4Anthropic alternativesopen source modelsNate HerkClaude Code harness