A common intuition among developers: Claude Code sessions get expensive over time. Context keeps accumulating, every turn resends the entire history, and token costs add up linearly. The obvious conclusion: /clear often, start fresh sessions for each task to save money.
That reasoning is half right and half wrong. The wrong half comes from leaving prompt caching out of the cost model. In practice, frequent /clear can cost more than keeping a long session alive.
Cumulative Context Really Does Cost
Start with what’s true. LLM APIs are stateless β every API call must resend the entire conversation history. After 10 turns with Claude, the 11th request contains all 10 previous turns plus your new question.
So yes, the input token count per call grows linearly as the session gets longer. It’s reasonable to conclude “longer sessions cost more.”
But Prompt Caching Changes the Rules
Anthropic introduced prompt caching in 2024, and Claude Code enables it by default. The rule is simple: identical prefixes only cost 10% of the normal price.
Sonnet 4.6 pricing:
| Type | Price (per million tokens) | Relative |
|---|---|---|
| Base input (uncached) | $3.00 | 100% |
| 5-minute cache write | $3.75 | 125% |
| 1-hour cache write | $6.00 | 200% |
| Cache read | $0.30 | 10% |
Opus is even more dramatic: base input $5, cache read only $0.50.
Meaning: the first time you send a large context block, it gets written to the cache and you pay a small write premium (25% above base). For the next 5 minutes, resending the same prefix costs 10% of base. The longer the session and the more cache hits accumulate, the lower your average per-token cost.
What “Prefix” Actually Means
Before going further, it’s worth unpacking the word “prefix.” A prompt is an ordered sequence of tokens. Cache matching runs from the very beginning, token by token β and a single differing token breaks everything after it.
In multi-turn conversations, every new turn only appends to the tail; the prior history stays untouched:
| |
So long conversations aren’t a cost disadvantage β they’re an advantage. The longer the accumulated history, the more tokens per turn get the 90% discount.
But this only holds while you’re strictly appending. If you could go back and edit Turn 5, every token after Turn 5 β even ones that look identical β invalidates because the prefix hash diverges from that point onward. That’s the cruelty of “prefix”: change one character in the middle and everything downstream is lost.
Analogy: git commit hashes. Tweak any historical commit and every hash after it changes.
Topic Switching: How Cache Bills Across A β B β C
The most-overlooked scenario: you discuss topic A with Claude, finish, move to topic B, then topic C β without /clear in between. A’s and B’s histories stay glued to the prompt prefix, getting billed at 10% on every single turn while they ride along.
Concretely:
| |
20 turns on C means an extra 20 Γ $0.015 = $0.30 spent “carrying corpses.” A and B may contribute nothing to C, but you’re paying for them to ride along.
Rule of thumb for when to /clear:
- A, B, C are independent (frontend in the morning / SQL in the afternoon / CI at night) β
/clearbetween topics - A, B, C reference each other (A defines spec / B implements / C debugs B) β don’t clear; the 10% price on history is cheap and useful
- History is heavy but its conclusion is condensable (A was a 50K doc you read) β clear, then paste a short summary of A’s conclusions as new context
A common misconception: “Claude Code automatically detects topic changes and drops stale content.” It doesn’t. Cache is mechanical prefix matching β it has no semantic understanding. Deciding what to forget is entirely a human responsibility: either /clear manually, or let auto-compact fire based on context usage (not topic).
The Three Variables That Actually Drive Cost
So the cost model isn’t “context size Γ number of turns.” It’s these three factors:
1. Cache Hit Rate
In a long, continuous session, every turn’s prefix hits the cache written by the previous turn. If a session has accumulated 50K tokens and turn 11 adds 2K new input:
- Without cache: 51K Γ $3 = $0.153
- With cache: 50K Γ $0.30 + 2K Γ $3 = $0.021
About a 7x difference.
What’s worst about aggressive /clear: every new session re-reads CLAUDE.md, re-learns your project files, re-warms the cache. These warm-up costs can easily exceed the “savings” from keeping context small.
2. Cache Invalidation
Cache requires 100% identical prefixes to hit. These actions invalidate it β some loudly, some quietly:
| Event | Impact |
|---|---|
| Editing message N | Everything from N onward invalidates (earlier still cached) |
| Adding/removing an MCP tool | Full invalidation (tool schemas sit at the front) |
| Switching Sonnet β Opus | Different model, different cache β starts over |
| Toggling web search / citations | system + message cache invalidates |
| Idle > 5 minutes (TTL expires) | Cache evaporates; next call pays 100% to rewrite |
| Auto-compact fires | Prefix is replaced by a summary; subsequent turns warm a fresh cache |
/clear | Everything resets |
Idle-over-5-minutes is the sneakiest β grab lunch, come back, type a question, and you’ve quietly paid full write price without any UI warning.
A nuance about auto-compact worth clarifying: per Anthropic’s implementation, the compaction API call itself sends the same prefix as the turn before it, so that call is a cache hit. The real cost lands after compaction β the new session uses the summary in place of the original history as its prefix, so every subsequent turn is warming a brand-new cache from that point on. The net cost is similar to “cache blown away,” but the mechanism is prefix replacement, not cache invalidation.
3. TTL (5 Minutes vs 1 Hour)
Default cache TTL is 5 minutes. Pause for more than 5 minutes and the cache expires β the next call pays full base input price.
Anthropic offers a 1-hour TTL option at 2x write cost ($6 vs $3) in exchange for longer persistence. Whether it’s worth it depends on rhythm β bursty work with 10β30 minute gaps may benefit; continuous work never hits the timeout anyway.
Counterintuitive: When Long Sessions Are Cheapest
Combine all three and you arrive at the opposite of “longer = more expensive”:
Long sessions are cheapest when:
- Work is continuous, turns within 5 minutes of each other
- No editing of history, no model switches, no MCP churn
- Context stays below the compaction threshold (~155K safe zone)
Short sessions / frequent clearing are costliest when:
- Every new session re-reads large context (CLAUDE.md, multiple files, skill definitions)
- Every new session pays a “cache warm-up tax”
- You never reap the 10% cache-read discount
My own experience: two hours of continuous work in one session often costs less than splitting the same work into four independent 30-minute sessions β because the latter pays four cold starts.
When Large Context Is Genuinely a Problem
None of this means context can grow forever with no consequence. Two thresholds turn “large context” from a cost problem into a quality problem:
1. Approaching the context window limit (Sonnet 200K / 1M, Opus 200K)
Model attention degrades past ~100K tokens, especially on content in the middle (the “lost in the middle” phenomenon). At this point the concern isn’t cost β it’s that the model can’t find or misuses what you gave it earlier.
2. Auto-compact triggers
Claude Code auto-compacts as you approach the limit. Compaction is a major operation β cache fully invalidates, cost spikes, and the result is a summary with possible detail loss.
So context shouldn’t grow unbounded, but the right reset trigger is “task complete” or “about to hit compaction,” not “session has been open for X hours.”
Practical Recommendations
| Situation | Recommendation |
|---|---|
| Mid-task | Don’t /clear, continue the session |
| Task done, starting a new one | /clear so the next session starts clean |
| Idle for >5 minutes | Use /resume instead of opening a new session (TTL expires but history is preserved) |
| Claude Code open often with idle gaps | Consider 1h TTL β 2x write cost but idle safety |
| Context exceeds 155K | Proactively end the session; don’t wait for auto-compact |
To measure your actual cost, try ccusage or claude-view. A high share of cache_read_input_tokens means you’re working efficiently; rising cache_creation_input_tokens with low reads means cache keeps invalidating β you’re burning money.
“Longer sessions waste more tokens” is a stateless-era intuition, but prompt caching has been rewriting those rules for two years. Check how you actually use Claude Code β the token savings might surprise you.
