Why Your 30th Claude Code Message Costs 31x More Than Your First
98.5% of the tokens in a 100-message Claude Code session are spent re-reading old context. Nate Herk's 18 token-management hacks — three tiers from beginner to advanced — are the difference between hitting your session limit at hour 1 and hour 5.
Every Claude Code user is suddenly hitting their session limit faster, even on the $200/month plan. The threads on X are full of the same complaint: "One prompt that used to be 1% of my session is now 10%." Anthropic admitted it — they've adjusted peak-hour throttling and the bottleneck moved.
But underneath the throttling change, there's a deeper truth almost nobody internalizes: your 30th Claude Code message costs 31× more than your first one. Not because the model got slower. Because every message re-reads the entire conversation from the beginning.
One developer tracked a 100-message session and found that 98.5% of all tokens were spent re-reading the chat history. Not generating answers. Re-reading.
Nate Herk just dropped an 18-hack guide organized into three tiers, and it's the most concrete cost-discipline content I've seen for Claude Code this quarter. Here's the consolidated take.
The token-compounding math you need to internalize
Every message in a Claude Code conversation forces the model to re-read:
- The entire conversation history (every previous message + every previous response)
- Your
CLAUDE.mdfile (every time, every message) - All connected MCP servers' tool definitions
- The system prompt
- Your custom agents and skills metadata
- Any files referenced earlier in the session
Message 1 might cost 500 tokens. Message 30 might cost 15,500 — 31× more. And it's not even producing better answers, because of a phenomenon called "loss in the middle" — models pay the most attention at the beginning and end of context, with the middle getting effectively ignored. So you're paying more and getting worse outputs as the conversation grows.
This isn't a limit problem. It's a context-hygiene problem. Here's the playbook to fix it.
Tier 1 — The hygiene basics (9 hacks)
/clearbetween unrelated tasks. The single biggest win. Every long chat is exponentially more expensive than the same prompt in a fresh chat.- Disconnect unused MCP servers. A single MCP can load ~18,000 tokens per message just for tool definitions. Run
/mcpat the start of each session and prune. Where you can, use a CLI instead of the corresponding MCP — Google Workspace CLI beats Google Workspace MCP. - Batch prompts into one message. Three sequential messages cost 3× what one combined message costs. Edit your original message and regenerate instead of stacking follow-up corrections.
- Plan mode before any real task. The biggest token waste is Claude going down the wrong path, writing code, and then having to scrap it all. Plan mode lets it ask clarifying questions first.
/contextand/costcommands./contextshows you exactly where tokens are going right now (history, MCP overhead, files)./costshows actual spend. Run these at the start and middle of every session.- Set up a status line showing current model and token usage. The terminal status bar makes the invisible bleed visible.
- Keep the Claude dashboard open in a tab so you can pace yourself against your remaining allocation.
- Be surgical with pasting. If the bug is in one function, paste the function, not the file. Don't dump entire files when one section will do.
- Watch Claude work. Especially on longer tasks — if it goes down the wrong path or gets stuck in a loop, kill it. Don't let it burn 80% of your tokens producing zero value.
Tier 2 — Architecture and discipline (5 hacks)
- Keep
CLAUDE.mdunder 200 lines. It's read on every single message. Treat it as an index, not a content store. Stack stack, conventions, build commands, the 95%-confidence rule. Point at files where deeper data lives. - Surgical file references. Don't say "check the repo for bugs" — say "check the
verifyUserfunction inauth.js." Use@filenameto lock Claude to a specific file. - Compact at 60%, not 95%. Auto-compact triggers at 95% by which point your context is already degraded. Run
/compactmanually around 60% with explicit instructions on what to preserve. After 3-4 compacts in a row, summarize and/clearinstead — quality degrades from compact stacking. - The 5-minute cache TTL. Claude Code's prompt cache expires after 5 minutes. Step away for a bathroom break and come back, your next message reprocesses everything at full cost. If you're leaving for >5 min,
/compactor/clearbefore walking away. - Command output bloat. Shell commands' full output enters context. A
git logwith 200 commits = 200 commits' worth of tokens. Deny noisy commands in project permissions; cap output where you can.
Tier 3 — Power user moves (4 hacks)
- Pick the right model per task. Sonnet for default coding. Haiku for sub-agents, formatting, simple tasks. Opus only for deep architectural planning — keep it under 20% of usage. For huge codebase reviews, delegate to Codex via the official plugin so you save Claude tokens entirely.
- Sub-agents cost 7-10× more. Every sub-agent wakes up with its own full context — system prompt, tools, MCP definitions. The win is when the sub-agent uses Haiku for a one-off task that doesn't need the main session's context. Use sparingly and route to cheap models.
- Run heavy sessions off-peak. Anthropic now drains your 5-hour session window faster during peak hours (8am-2pm ET weekdays). Big refactors, multi-agent runs, big projects → evenings and weekends.
CLAUDE.mdas a constitution. Store stable decisions, architecture rules, and progress summaries here. Save decisions, not conversations. Every architectural call you store there is a paragraph you never have to type again.
The self-evolving CLAUDE.md trick
Nate's bonus hack worth flagging separately: a self-learning CLAUDE.md section. Add this near the bottom of yours:
"Applied learning. When something fails repeatedly, when I have to re-explain, or when a workaround is found for a platform/tool/limitation, add a one-line bullet here. Keep each bullet under 15 words. No explanations. Only add things that will save time in future sessions."
Claude appends a bullet whenever it hits a recurring issue. Over a month, you accumulate a curated list of things-you've-already-explained that prevents you ever explaining them again. Audit it weekly — the risk is it bloats faster than you'd think.
The mindset shift Nate names at the end
The video closes on a counterintuitive line worth pulling out:
"Hitting your limit shouldn't have a negative connotation. If you're optimizing tokens and still hitting your limit, that means you're using this tool so much that you're a power user — which is what you want to be."
The goal isn't don't hit the limit. The goal is get the most leverage per token before you hit it. People not hitting their limit are leaving the most expensive tool in their stack underused.
What to actually do this week
Three things to start tonight:
- Run
/contexton your next session. See how many tokens you're at before you've even sent a message. Most people are shocked to find 50,000+ tokens of overhead from MCP servers and a bloated CLAUDE.md. - Trim your
CLAUDE.mdto under 200 lines and disconnect MCP servers you don't use this week. These two changes alone can 2× your session life. - Switch your default to
/model opus_planso Opus handles planning and Sonnet handles execution. Combined with the hygiene above, your $200/month plan should feel like a $400/month plan.
The Bottom Line
Most people who hit their Claude Code session limit don't need a bigger plan. They need to stop re-sending their entire conversation history 30 times when they could be sending it 5 times. Token compounding is the invisible tax on lazy context management; the discipline above is the antidote. Run the hygiene week-one and your session feels like a different product. It's not a limits problem. It's a context-hygiene problem.