Stop chunking, start counting: when 1M context beats your RAG pipeline

May 31, 20268 min readguidesmodelsrag

When Claude 4.6 Opus shipped a 1M-token context window, the hot takes split into two camps: "RAG is dead" and "nothing changes." Both are wrong, and the actual answer is a number.

We re-ran our document-analysis comparison across corpus sizes from 5 to 500 documents. Below roughly 50 documents, single-pass analysis beat our tuned retrieval pipeline on accuracy, setup time, and total cost. Above it, retrieval pulled away — decisively at volume.

The break-even logic

Retrieval pipelines have a fixed cost: chunking strategy, embedding refresh, index maintenance, and the failure modes none of the tutorials mention. That cost only amortizes when you query the same corpus repeatedly or the corpus outgrows context.

Single-pass has the opposite profile: zero setup, but you pay full token price every run. For a 30-document due-diligence review you run once, the pipeline never pays for itself.

What to do with this

Count your documents and count your repeat queries. One-shot analysis under 50 documents: single pass, no pipeline. Living corpus or high query volume: retrieval still wins and the new context window changes nothing.

The full comparison data and our updated verdict live in the changelog — and the verdict will move again when the economics do.

Never need to catch up again

The weekly delta — only verdict changes and act-now items. No digest filler.