<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Cost-Optimization on recca0120 Tech Notes</title><link>https://recca0120.github.io/en/tags/cost-optimization/</link><description>Recent content in Cost-Optimization on recca0120 Tech Notes</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 13 Apr 2026 18:00:00 +0800</lastBuildDate><atom:link href="https://recca0120.github.io/en/tags/cost-optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>Does a Long Claude Code Session Waste Tokens? A Cost Model Most People Get Wrong</title><link>https://recca0120.github.io/en/2026/04/13/claude-code-session-cost-cache-misconception/</link><pubDate>Mon, 13 Apr 2026 18:00:00 +0800</pubDate><guid>https://recca0120.github.io/en/2026/04/13/claude-code-session-cost-cache-misconception/</guid><description>&lt;img src="https://recca0120.github.io/" alt="Featured image of post Does a Long Claude Code Session Waste Tokens? A Cost Model Most People Get Wrong" /&gt;&lt;p&gt;A common intuition among developers: Claude Code sessions get expensive over time. Context keeps accumulating, every turn resends the entire history, and token costs add up linearly. The obvious conclusion: &lt;code&gt;/clear&lt;/code&gt; often, start fresh sessions for each task to save money.&lt;/p&gt;
&lt;p&gt;That reasoning is &lt;strong&gt;half right and half wrong&lt;/strong&gt;. The wrong half comes from leaving prompt caching out of the cost model. In practice, &lt;strong&gt;frequent &lt;code&gt;/clear&lt;/code&gt; can cost more than keeping a long session alive&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="cumulative-context-really-does-cost"&gt;&lt;a href="#cumulative-context-really-does-cost" class="header-anchor"&gt;&lt;/a&gt;Cumulative Context Really Does Cost
&lt;/h2&gt;&lt;p&gt;Start with what&amp;rsquo;s true. LLM APIs are stateless — every API call must resend the entire conversation history. After 10 turns with Claude, the 11th request contains all 10 previous turns plus your new question.&lt;/p&gt;
&lt;p&gt;So yes, the input token count per call grows linearly as the session gets longer. It&amp;rsquo;s reasonable to conclude &amp;ldquo;longer sessions cost more.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="but-prompt-caching-changes-the-rules"&gt;&lt;a href="#but-prompt-caching-changes-the-rules" class="header-anchor"&gt;&lt;/a&gt;But Prompt Caching Changes the Rules
&lt;/h2&gt;&lt;p&gt;Anthropic introduced prompt caching in 2024, and Claude Code enables it by default. The rule is simple: &lt;strong&gt;identical prefixes only cost 10% of the normal price&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Sonnet 4.6 pricing:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Price (per million tokens)&lt;/th&gt;
 &lt;th&gt;Relative&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Base input (uncached)&lt;/td&gt;
 &lt;td&gt;$3.00&lt;/td&gt;
 &lt;td&gt;100%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5-minute cache write&lt;/td&gt;
 &lt;td&gt;$3.75&lt;/td&gt;
 &lt;td&gt;125%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;1-hour cache write&lt;/td&gt;
 &lt;td&gt;$6.00&lt;/td&gt;
 &lt;td&gt;200%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Cache read&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;$0.30&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;10%&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Opus is even more dramatic: base input $5, cache read only $0.50.&lt;/p&gt;
&lt;p&gt;Meaning: the first time you send a large context block, it gets written to the cache and you pay a small write premium (25% above base). For the next 5 minutes, resending the same prefix costs 10% of base. The longer the session and the more cache hits accumulate, the lower your average per-token cost.&lt;/p&gt;
&lt;h2 id="what-prefix-actually-means"&gt;&lt;a href="#what-prefix-actually-means" class="header-anchor"&gt;&lt;/a&gt;What &amp;ldquo;Prefix&amp;rdquo; Actually Means
&lt;/h2&gt;&lt;p&gt;Before going further, it&amp;rsquo;s worth unpacking the word &amp;ldquo;prefix.&amp;rdquo; A prompt is an ordered sequence of tokens. Cache matching runs &lt;strong&gt;from the very beginning, token by token&lt;/strong&gt; — and a single differing token breaks everything after it.&lt;/p&gt;
&lt;p&gt;In multi-turn conversations, every new turn &lt;strong&gt;only appends to the tail&lt;/strong&gt;; the prior history stays untouched:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Turn 1: [system] [CLAUDE.md] [Q1]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Turn 2: [system] [CLAUDE.md] [Q1] [A1] [Q2]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↑ identical prefix → cache hit at 10% price
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↑ new tail → written to cache
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Turn 3: [system] [CLAUDE.md] [Q1] [A1] [Q2] [A2] [Q3]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↑ an even longer prefix hits cache
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;So long conversations &lt;strong&gt;aren&amp;rsquo;t a cost disadvantage — they&amp;rsquo;re an advantage&lt;/strong&gt;. The longer the accumulated history, the more tokens per turn get the 90% discount.&lt;/p&gt;
&lt;p&gt;But this only holds while you&amp;rsquo;re strictly appending. If you could go back and edit Turn 5, every token after Turn 5 — even ones that look identical — invalidates because the prefix hash diverges from that point onward. That&amp;rsquo;s the cruelty of &amp;ldquo;prefix&amp;rdquo;: change one character in the middle and everything downstream is lost.&lt;/p&gt;
&lt;p&gt;Analogy: git commit hashes. Tweak any historical commit and every hash after it changes.&lt;/p&gt;
&lt;h2 id="topic-switching-how-cache-bills-across-a--b--c"&gt;&lt;a href="#topic-switching-how-cache-bills-across-a--b--c" class="header-anchor"&gt;&lt;/a&gt;Topic Switching: How Cache Bills Across A → B → C
&lt;/h2&gt;&lt;p&gt;The most-overlooked scenario: you discuss topic A with Claude, finish, move to topic B, then topic C — &lt;strong&gt;without &lt;code&gt;/clear&lt;/code&gt; in between&lt;/strong&gt;. A&amp;rsquo;s and B&amp;rsquo;s histories stay glued to the prompt prefix, getting billed at 10% on every single turn while they ride along.&lt;/p&gt;
&lt;p&gt;Concretely:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Topic A (30K tokens accumulated over 10 turns)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; → A&amp;#39;s 30K written to cache
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Switch to B (no /clear)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Turn 11 = [A&amp;#39;s 30K] + [B&amp;#39;s new question]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↑ 30K × $0.30/M = $0.009 from cache
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;B accumulates 20K more
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Switch to C (still no /clear)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Every turn = [A&amp;#39;s 30K] + [B&amp;#39;s 20K] + [C&amp;#39;s new question]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↑ 50K from cache ≈ $0.015 / turn
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;20 turns on C means an extra 20 × $0.015 = $0.30 spent &amp;ldquo;carrying corpses.&amp;rdquo; A and B may contribute nothing to C, but you&amp;rsquo;re paying for them to ride along.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb for when to &lt;code&gt;/clear&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A, B, C are independent&lt;/strong&gt; (frontend in the morning / SQL in the afternoon / CI at night) → &lt;code&gt;/clear&lt;/code&gt; between topics&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A, B, C reference each other&lt;/strong&gt; (A defines spec / B implements / C debugs B) → don&amp;rsquo;t clear; the 10% price on history is cheap and useful&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;History is heavy but its conclusion is condensable&lt;/strong&gt; (A was a 50K doc you read) → clear, then paste a short summary of A&amp;rsquo;s conclusions as new context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A common misconception: &amp;ldquo;Claude Code automatically detects topic changes and drops stale content.&amp;rdquo; &lt;strong&gt;It doesn&amp;rsquo;t.&lt;/strong&gt; Cache is mechanical prefix matching — it has no semantic understanding. Deciding what to forget is &lt;strong&gt;entirely a human responsibility&lt;/strong&gt;: either &lt;code&gt;/clear&lt;/code&gt; manually, or let auto-compact fire based on context usage (not topic).&lt;/p&gt;
&lt;h2 id="the-three-variables-that-actually-drive-cost"&gt;&lt;a href="#the-three-variables-that-actually-drive-cost" class="header-anchor"&gt;&lt;/a&gt;The Three Variables That Actually Drive Cost
&lt;/h2&gt;&lt;p&gt;So the cost model isn&amp;rsquo;t &amp;ldquo;context size × number of turns.&amp;rdquo; It&amp;rsquo;s these three factors:&lt;/p&gt;
&lt;h3 id="1-cache-hit-rate"&gt;&lt;a href="#1-cache-hit-rate" class="header-anchor"&gt;&lt;/a&gt;1. Cache Hit Rate
&lt;/h3&gt;&lt;p&gt;In a long, continuous session, every turn&amp;rsquo;s prefix hits the cache written by the previous turn. If a session has accumulated 50K tokens and turn 11 adds 2K new input:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Without cache: 51K × $3 = $0.153&lt;/li&gt;
&lt;li&gt;With cache: 50K × $0.30 + 2K × $3 = $0.021&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;About a &lt;strong&gt;7x difference&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What&amp;rsquo;s worst about aggressive &lt;code&gt;/clear&lt;/code&gt;&lt;/strong&gt;: every new session re-reads &lt;code&gt;CLAUDE.md&lt;/code&gt;, re-learns your project files, re-warms the cache. These warm-up costs can easily exceed the &amp;ldquo;savings&amp;rdquo; from keeping context small.&lt;/p&gt;
&lt;h3 id="2-cache-invalidation"&gt;&lt;a href="#2-cache-invalidation" class="header-anchor"&gt;&lt;/a&gt;2. Cache Invalidation
&lt;/h3&gt;&lt;p&gt;Cache requires &lt;strong&gt;100% identical prefixes&lt;/strong&gt; to hit. These actions invalidate it — some loudly, some quietly:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Event&lt;/th&gt;
 &lt;th&gt;Impact&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Editing message N&lt;/td&gt;
 &lt;td&gt;Everything from N onward invalidates (earlier still cached)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Adding/removing an MCP tool&lt;/td&gt;
 &lt;td&gt;Full invalidation (tool schemas sit at the front)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Switching Sonnet ↔ Opus&lt;/td&gt;
 &lt;td&gt;Different model, different cache — starts over&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Toggling web search / citations&lt;/td&gt;
 &lt;td&gt;system + message cache invalidates&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Idle &amp;gt; 5 minutes (TTL expires)&lt;/td&gt;
 &lt;td&gt;Cache evaporates; next call pays 100% to rewrite&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Auto-compact fires&lt;/td&gt;
 &lt;td&gt;Prefix is replaced by a summary; subsequent turns warm a fresh cache&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;/clear&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Everything resets&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Idle-over-5-minutes is the sneakiest — grab lunch, come back, type a question, and you&amp;rsquo;ve quietly paid full write price without any UI warning.&lt;/p&gt;
&lt;p&gt;A nuance about auto-compact worth clarifying: per Anthropic&amp;rsquo;s implementation, the &lt;strong&gt;compaction API call itself&lt;/strong&gt; sends the same prefix as the turn before it, so that call is a cache hit. The real cost lands &lt;strong&gt;after&lt;/strong&gt; compaction — the new session uses the summary in place of the original history as its prefix, so every subsequent turn is warming a brand-new cache from that point on. The net cost is similar to &amp;ldquo;cache blown away,&amp;rdquo; but the mechanism is prefix replacement, not cache invalidation.&lt;/p&gt;
&lt;h3 id="3-ttl-5-minutes-vs-1-hour"&gt;&lt;a href="#3-ttl-5-minutes-vs-1-hour" class="header-anchor"&gt;&lt;/a&gt;3. TTL (5 Minutes vs 1 Hour)
&lt;/h3&gt;&lt;p&gt;Default cache TTL is 5 minutes. Pause for more than 5 minutes and the cache expires — the next call pays full base input price.&lt;/p&gt;
&lt;p&gt;Anthropic offers a 1-hour TTL option at 2x write cost ($6 vs $3) in exchange for longer persistence. Whether it&amp;rsquo;s worth it depends on rhythm — bursty work with 10–30 minute gaps may benefit; continuous work never hits the timeout anyway.&lt;/p&gt;
&lt;h2 id="counterintuitive-when-long-sessions-are-cheapest"&gt;&lt;a href="#counterintuitive-when-long-sessions-are-cheapest" class="header-anchor"&gt;&lt;/a&gt;Counterintuitive: When Long Sessions Are Cheapest
&lt;/h2&gt;&lt;p&gt;Combine all three and you arrive at the opposite of &amp;ldquo;longer = more expensive&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Long sessions are cheapest when&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work is continuous, turns within 5 minutes of each other&lt;/li&gt;
&lt;li&gt;No editing of history, no model switches, no MCP churn&lt;/li&gt;
&lt;li&gt;Context stays below the compaction threshold (~155K safe zone)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Short sessions / frequent clearing are costliest when&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every new session re-reads large context (CLAUDE.md, multiple files, skill definitions)&lt;/li&gt;
&lt;li&gt;Every new session pays a &amp;ldquo;cache warm-up tax&amp;rdquo;&lt;/li&gt;
&lt;li&gt;You never reap the 10% cache-read discount&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My own experience: two hours of continuous work in one session often costs less than splitting the same work into four independent 30-minute sessions — because the latter pays four cold starts.&lt;/p&gt;
&lt;h2 id="when-large-context-is-genuinely-a-problem"&gt;&lt;a href="#when-large-context-is-genuinely-a-problem" class="header-anchor"&gt;&lt;/a&gt;When Large Context &lt;strong&gt;Is&lt;/strong&gt; Genuinely a Problem
&lt;/h2&gt;&lt;p&gt;None of this means context can grow forever with no consequence. Two thresholds turn &amp;ldquo;large context&amp;rdquo; from a cost problem into a &lt;strong&gt;quality problem&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Approaching the context window limit&lt;/strong&gt; (Sonnet 200K / 1M, Opus 200K)&lt;/p&gt;
&lt;p&gt;Model attention degrades past ~100K tokens, especially on content in the middle (the &amp;ldquo;lost in the middle&amp;rdquo; phenomenon). At this point the concern isn&amp;rsquo;t cost — it&amp;rsquo;s that the model &lt;strong&gt;can&amp;rsquo;t find or misuses&lt;/strong&gt; what you gave it earlier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Auto-compact triggers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Claude Code auto-compacts as you approach the limit. Compaction is a major operation — cache fully invalidates, cost spikes, and the result is a summary with possible detail loss.&lt;/p&gt;
&lt;p&gt;So context shouldn&amp;rsquo;t grow unbounded, but the right reset trigger is &amp;ldquo;task complete&amp;rdquo; or &amp;ldquo;about to hit compaction,&amp;rdquo; not &amp;ldquo;session has been open for X hours.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="practical-recommendations"&gt;&lt;a href="#practical-recommendations" class="header-anchor"&gt;&lt;/a&gt;Practical Recommendations
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Situation&lt;/th&gt;
 &lt;th&gt;Recommendation&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Mid-task&lt;/td&gt;
 &lt;td&gt;Don&amp;rsquo;t &lt;code&gt;/clear&lt;/code&gt;, continue the session&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Task done, starting a new one&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;/clear&lt;/code&gt; so the next session starts clean&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Idle for &amp;gt;5 minutes&lt;/td&gt;
 &lt;td&gt;Use &lt;code&gt;/resume&lt;/code&gt; instead of opening a new session (TTL expires but history is preserved)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Code open often with idle gaps&lt;/td&gt;
 &lt;td&gt;Consider 1h TTL — 2x write cost but idle safety&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Context exceeds 155K&lt;/td&gt;
 &lt;td&gt;Proactively end the session; don&amp;rsquo;t wait for auto-compact&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;To measure your actual cost, try &lt;a class="link" href="https://github.com/ryoppippi/ccusage" target="_blank" rel="noopener"
 &gt;ccusage&lt;/a&gt; or &lt;a class="link" href="https://recca0120.github.io/en/2026/04/07/claude-view-mission-control/" &gt;claude-view&lt;/a&gt;. A high share of &lt;code&gt;cache_read_input_tokens&lt;/code&gt; means you&amp;rsquo;re working efficiently; rising &lt;code&gt;cache_creation_input_tokens&lt;/code&gt; with low reads means cache keeps invalidating — you&amp;rsquo;re burning money.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Longer sessions waste more tokens&amp;rdquo; is a stateless-era intuition, but prompt caching has been rewriting those rules for two years. Check how you actually use Claude Code — the token savings might surprise you.&lt;/p&gt;
&lt;h2 id="references"&gt;&lt;a href="#references" class="header-anchor"&gt;&lt;/a&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching" target="_blank" rel="noopener"
 &gt;Prompt Caching — Claude API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://code.claude.com/docs/en/costs" target="_blank" rel="noopener"
 &gt;Manage Costs Effectively — Claude Code Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.claudecodecamp.com/p/how-prompt-caching-actually-works-in-claude-code" target="_blank" rel="noopener"
 &gt;How Prompt Caching Actually Works in Claude Code — Claude Code Camp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.mindstudio.ai/blog/claude-code-context-compounding-explained-2" target="_blank" rel="noopener"
 &gt;How Context Compounding Works in Claude Code — MindStudio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ryoppippi/ccusage" target="_blank" rel="noopener"
 &gt;ccusage — Claude Code Token Usage CLI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>