Week of April 20, 2026
Anthropic ships Sonnet 4.6 with 90% cheaper prompt caching. OpenAI quietly raises GPT-5 mini context to 256K. AWS Bedrock adds Knowledge Base agentic retrieval as GA.
Sonnet 4.6 prompt caching now cuts cached tokens by 90%
Anthropic dropped cached input pricing on Sonnet 4.6 from 25% to 10% of base. For RAG agents that reuse system prompts and retrieved context across many tool-calling iterations, real-world spend drops 30 to 70%. The 5-minute TTL is unchanged. Worth turning on by default for any production agent that has not already.
Anthropic pricing docsGPT-5 mini context window quietly raised to 256K
OpenAI updated the gpt-5-mini context window to 256K tokens with no announcement, just a model card refresh. Same per-token pricing. Practical impact: classification and routing agents that needed to fall back to GPT-5.4 for long inputs can now stay on the cheaper model. Watch your evals though. Bigger context does not mean better attention to far-back tokens.
AWS Bedrock Knowledge Base agentic retrieval moves to GA
Bedrock's Knowledge Base now supports agentic retrieval as a generally-available feature: the LLM decides what to query and when, instead of one-shot retrieve-then-generate. Costs roughly 3 to 5x more per query because the model makes multiple retrievals, but answer quality on multi-hop questions improves materially. Toggle it via the retrievalConfiguration.vectorSearchConfiguration.overrideSearchType field.
AWS Bedrock changelog