Analysis

Claude 4.6’s 1M Context Isn’t an Upgrade — It’s a Billing Model

How a simple “What is 1 + 1?” can cost you four dollars, and what Anthropic’s release page quietly revealed about the real price of a million tokens.

By Marcin Cebula·March 15, 2026

When Anthropic announced Claude 4.6 with a one-million-token context window, the reaction was predictable. Influencers raced to stuff entire codebases into Claude. Creators dumped twenty books of data into a single session. The headline wrote itself: bigger number, bigger model, bigger breakthrough.

But the more I read Anthropic’s release post, the more things started to feel a little strange. Not because of what they said — but because of how they said it. And more importantly, what they chose not to say at all.

* * *

The Phrase That Hides Everything

Buried in the release language is a phrase that sounds reassuring: you can use the full one-million-token window at “standard pricing.” For most readers, that’s where the thinking stops. Standard pricing. Nothing to worry about.

But anyone who has actually worked with large context models knows that phrase conceals a critical detail. With large context windows, you don’t just pay for the new tokens you add to a conversation. You pay for every token already sitting in the context window. Every single one. Every time you send a message.

And once you understand that, something interesting starts to happen.

The Four-Dollar Question

Consider a simple thought experiment. You open a fresh Claude session and type: What is 1 + 1? Claude returns 2. The cost of that interaction is almost nothing — a fraction of a penny. The kind of number that rounds to zero on any invoice.

Now fast-forward. You’ve been working in Claude Code for a few hours. You’ve loaded half your codebase into the context. You’re thirty turns deep. Your session is sitting at 700,000 tokens. For fun, you ask the exact same question: What is 1 + 1?

The answer is still 2. The math hasn’t changed. But the cost has. That same question now runs you about forty cents — and that’s actually the best-case scenario, because it assumes your context is still cached.

The same question — “What is 1 + 1?” — at different context sizes

Fresh session (near-zero context)< $0.001

700k tokens, cached (Sonnet 4.6)$0.21

700k tokens, uncached (Sonnet 4.6)$2.10

700k tokens, uncached (Opus 4.6)$3.50

The worst case? Four dollars. Your little arithmetic question just cost you the price of a slice of New York City pizza. The same question, asked the same way, to the same model — separated only by how much context had accumulated in the window.

Before anyone reads this as doom-and-gloom: it isn’t. I use Claude Code every single day. It’s one of the best developer tools I’ve ever worked with, hands down. And to be fair to Anthropic, their KV cache pricing is genuinely well designed. But a few things about this specific release made me pause.

The Disappearing Compaction Bar

Right after Claude 4.6 introduced the million-token window, the token compaction bar — the visual indicator showing how full your context was — disappeared from the interface. At the same time, the release page deployed some carefully chosen language around pricing. Nothing explicitly misleading. Just enough ambiguity to make you look twice if you were paying attention.

More concerning is the default auto-compaction threshold. Anthropic set it to trigger at 95% context usage. At 200k tokens, that was a reasonable default — plenty of room to work before compaction kicked in. But at one million tokens, that same 95% threshold is a strange choice. It makes perfect sense in some workflows and very little sense in most others.

And there’s no configuration option to adjust it. No way to set a custom compaction threshold for your session. You have to remember to manage it manually — and hope you don’t forget.

The Cache Time Bomb

To understand why the costs spike so dramatically, you need to understand how the KV cache works.

Picture this: you’ve spent four hours vibe-coding — which, in practice, means about ten minutes of actual productive work — and you step out for lunch. According to Anthropic’s documentation, the KV cache expires after roughly five minutes of inactivity. In my testing it sometimes lasted longer, but five minutes is the number you should plan around.

When you return, those five minutes have passed. Your cache has expired. You ask your next question. Same session. Same context. But now Claude has to reload and process all 700,000 tokens from scratch. No caching discount. Full price for every token in the window.

With Sonnet 4.6, that reload costs roughly $2.10 before you even receive an answer. With Opus 4.6, it’s closer to $3.50. If you’re the kind of developer who takes five-minute breaks between turns at a high token count, you’re going to have a very expensive afternoon.

“More context. More problems, Baby!” — Me

IMO, the 200k Boundary Was a Feature — Not a Bug

The previous 200k context window wasn’t just a technical limitation. It was a natural boundary — a point where developers were forced to compress, reset, and think strategically about what they were feeding Claude. That constraint created a mental model, a discipline. The window was large enough to handle complex work, but it required intention. You had to plan. You had to optimize. You couldn’t dump your entire repository into the session and hope for the best.

In software engineering, we have a foundational principle: “garbage in, garbage out.” With large language models, that principle matters ten times more. What you feed the model directly determines what you get back. This is supported by the explosion of roles in Context Engineering and ContextOps — emerging disciplines focused on providing an LLM with exactly what it needs, when it needs it, and keeping the context clean. Outside of architectural choices, this is currently one of the largest levers you can pull when it comes to pace, quality, and security of code.

There is something genuinely elegant about giving Claude exactly what it needs — no noise, no cruft, no bloated context. Just the information and the instruction. When you get this right, Claude can solve a problem in a single shot. Not just with an answer, but with a correct implementation — one that respects your existing architecture and follows your coding standards. It’s easy to one-shot a solution. It’s a different thing entirely to one-shot an implementation that fits correctly into a complex codebase.

When context becomes cluttered, the opposite happens. Claude loses track of what you actually asked for. It rewrites code you never told it to touch. It wanders through your files on multi-turn detours, modifying things at random. If you’ve ever wondered why Claude decided to refactor half your project when all you asked for was a bug fix — the answer is almost certainly context pollution.

Context Rot: The Benchmark Anthropic Didn’t Mean to Publish

Researchers call this degradation “context rot,” sometimes referred to as the “lost in the middle” problem. As the context window fills, the model’s ability to attend to and weight information effectively declines. Accuracy drops. Relevance drifts. The model starts making decisions based on noise rather than signal.

And Anthropic, perhaps inadvertently, proved this with their own data.

Alongside the Opus 4.6 release, Anthropic published a benchmark chart — ostensibly to demonstrate how much better Claude performs than OpenAI’s models across different context sizes. On the surface, it succeeded. Claude does outperform the competition at scale. But the chart also revealed exactly the problem we’re discussing.

Model	Accuracy at 250k	Accuracy at 1M	Drop
Opus 4.6	91.9%	78.3%	−13.6 pts
Sonnet 4.6	90.6%	65.1%	−25.5 pts

A 25-point accuracy drop for Sonnet. Nearly 14 points for Opus. These aren’t rounding errors. These are significant performance degradations — baked into the product, visible in Anthropic’s own benchmarks, and easy to miss if you only skimmed the release page.

I understand what Anthropic was going for: “Look how much better we are than the competition at a million tokens.” And they’re right — they are better. But in making that case, they also exposed exactly why filling a million-token window is dangerous if you don’t understand what’s happening underneath.

* * *

What This Actually Means for Developers

The takeaway here isn’t that the million-token window is useless. It isn’t. For very specific use cases — holding a large codebase in context because the 200k window keeps auto-compacting, working through complex multi-file documentation, maintaining long architectural conversations — it’s a meaningful capability.

But treating it as a dumping ground, the way most early adopters did, is not engineering. It’s a fundamental misunderstanding of how the technology works.

You don’t need to be Andrew Karpathy to use Claude effectively. But you do need to understand the basic mechanics of what’s happening when you interact with a large language model. You need to know that every token in your context costs money on every turn. You need to know that cached tokens expire. You need to know that accuracy degrades as context grows.

When you understand these things, something shifts. You stop being frustrated when Claude does something unexpected, because you can actually predict what it will do. You start keeping your context intentional. When you finish one problem and move to something unrelated, you clear the window instead of carrying forward 500,000 tokens of irrelevant history.

And you definitely stop using Claude Code for basic arithmetic at 900,000 tokens of context at $5 a message.

1 + 1 = $4

* actual cost may vary by context window size

The views expressed in this article are the author’s own. Cost calculations are based on publicly available Anthropic API pricing as of March 2026.