Why Your 30th Claude Code Message Costs 31x More Than Your First

98.5% of the tokens in a 100-message Claude Code session are spent re-reading old context. Nate Herk's 18 token-management hacks — three tiers from beginner to advanced — are the difference between hitting your session limit at hour 1 and hour 5.

T
The VIP Desk
5 min read·May 13, 2026·Summarizing Nate Herk
the-prompt-vip

Every Claude Code user is suddenly hitting their session limit faster, even on the $200/month plan. The threads on X are full of the same complaint: "One prompt that used to be 1% of my session is now 10%." Anthropic admitted it — they've adjusted peak-hour throttling and the bottleneck moved.

But underneath the throttling change, there's a deeper truth almost nobody internalizes: your 30th Claude Code message costs 31× more than your first one. Not because the model got slower. Because every message re-reads the entire conversation from the beginning.

One developer tracked a 100-message session and found that 98.5% of all tokens were spent re-reading the chat history. Not generating answers. Re-reading.

Nate Herk just dropped an 18-hack guide organized into three tiers, and it's the most concrete cost-discipline content I've seen for Claude Code this quarter. Here's the consolidated take.

The token-compounding math you need to internalize

Every message in a Claude Code conversation forces the model to re-read:

  • The entire conversation history (every previous message + every previous response)
  • Your CLAUDE.md file (every time, every message)
  • All connected MCP servers' tool definitions
  • The system prompt
  • Your custom agents and skills metadata
  • Any files referenced earlier in the session

Message 1 might cost 500 tokens. Message 30 might cost 15,500 — 31× more. And it's not even producing better answers, because of a phenomenon called "loss in the middle" — models pay the most attention at the beginning and end of context, with the middle getting effectively ignored. So you're paying more and getting worse outputs as the conversation grows.

This isn't a limit problem. It's a context-hygiene problem. Here's the playbook to fix it.

Tier 1 — The hygiene basics (9 hacks)

  1. /clear between unrelated tasks. The single biggest win. Every long chat is exponentially more expensive than the same prompt in a fresh chat.
  2. Disconnect unused MCP servers. A single MCP can load ~18,000 tokens per message just for tool definitions. Run /mcp at the start of each session and prune. Where you can, use a CLI instead of the corresponding MCP — Google Workspace CLI beats Google Workspace MCP.
  3. Batch prompts into one message. Three sequential messages cost 3× what one combined message costs. Edit your original message and regenerate instead of stacking follow-up corrections.
  4. Plan mode before any real task. The biggest token waste is Claude going down the wrong path, writing code, and then having to scrap it all. Plan mode lets it ask clarifying questions first.
  5. /context and /cost commands. /context shows you exactly where tokens are going right now (history, MCP overhead, files). /cost shows actual spend. Run these at the start and middle of every session.
  6. Set up a status line showing current model and token usage. The terminal status bar makes the invisible bleed visible.
  7. Keep the Claude dashboard open in a tab so you can pace yourself against your remaining allocation.
  8. Be surgical with pasting. If the bug is in one function, paste the function, not the file. Don't dump entire files when one section will do.
  9. Watch Claude work. Especially on longer tasks — if it goes down the wrong path or gets stuck in a loop, kill it. Don't let it burn 80% of your tokens producing zero value.

Tier 2 — Architecture and discipline (5 hacks)

  1. Keep CLAUDE.md under 200 lines. It's read on every single message. Treat it as an index, not a content store. Stack stack, conventions, build commands, the 95%-confidence rule. Point at files where deeper data lives.
  2. Surgical file references. Don't say "check the repo for bugs" — say "check the verifyUser function in auth.js." Use @filename to lock Claude to a specific file.
  3. Compact at 60%, not 95%. Auto-compact triggers at 95% by which point your context is already degraded. Run /compact manually around 60% with explicit instructions on what to preserve. After 3-4 compacts in a row, summarize and /clear instead — quality degrades from compact stacking.
  4. The 5-minute cache TTL. Claude Code's prompt cache expires after 5 minutes. Step away for a bathroom break and come back, your next message reprocesses everything at full cost. If you're leaving for >5 min, /compact or /clear before walking away.
  5. Command output bloat. Shell commands' full output enters context. A git log with 200 commits = 200 commits' worth of tokens. Deny noisy commands in project permissions; cap output where you can.

Tier 3 — Power user moves (4 hacks)

  1. Pick the right model per task. Sonnet for default coding. Haiku for sub-agents, formatting, simple tasks. Opus only for deep architectural planning — keep it under 20% of usage. For huge codebase reviews, delegate to Codex via the official plugin so you save Claude tokens entirely.
  2. Sub-agents cost 7-10× more. Every sub-agent wakes up with its own full context — system prompt, tools, MCP definitions. The win is when the sub-agent uses Haiku for a one-off task that doesn't need the main session's context. Use sparingly and route to cheap models.
  3. Run heavy sessions off-peak. Anthropic now drains your 5-hour session window faster during peak hours (8am-2pm ET weekdays). Big refactors, multi-agent runs, big projects → evenings and weekends.
  4. CLAUDE.md as a constitution. Store stable decisions, architecture rules, and progress summaries here. Save decisions, not conversations. Every architectural call you store there is a paragraph you never have to type again.

The self-evolving CLAUDE.md trick

Nate's bonus hack worth flagging separately: a self-learning CLAUDE.md section. Add this near the bottom of yours:

"Applied learning. When something fails repeatedly, when I have to re-explain, or when a workaround is found for a platform/tool/limitation, add a one-line bullet here. Keep each bullet under 15 words. No explanations. Only add things that will save time in future sessions."

Claude appends a bullet whenever it hits a recurring issue. Over a month, you accumulate a curated list of things-you've-already-explained that prevents you ever explaining them again. Audit it weekly — the risk is it bloats faster than you'd think.

The mindset shift Nate names at the end

The video closes on a counterintuitive line worth pulling out:

"Hitting your limit shouldn't have a negative connotation. If you're optimizing tokens and still hitting your limit, that means you're using this tool so much that you're a power user — which is what you want to be."

The goal isn't don't hit the limit. The goal is get the most leverage per token before you hit it. People not hitting their limit are leaving the most expensive tool in their stack underused.

What to actually do this week

Three things to start tonight:

  1. Run /context on your next session. See how many tokens you're at before you've even sent a message. Most people are shocked to find 50,000+ tokens of overhead from MCP servers and a bloated CLAUDE.md.
  2. Trim your CLAUDE.md to under 200 lines and disconnect MCP servers you don't use this week. These two changes alone can 2× your session life.
  3. Switch your default to /model opus_plan so Opus handles planning and Sonnet handles execution. Combined with the hygiene above, your $200/month plan should feel like a $400/month plan.

The Bottom Line

Most people who hit their Claude Code session limit don't need a bigger plan. They need to stop re-sending their entire conversation history 30 times when they could be sending it 5 times. Token compounding is the invisible tax on lazy context management; the discipline above is the antidote. Run the hygiene week-one and your session feels like a different product. It's not a limits problem. It's a context-hygiene problem.

the-prompt-vipClaude Code tokensClaude limit hackssession managementCLAUDE.mdMCP server tokensplan modeClaude Code optimizationNate Herktoken compoundinglost in the middle