Jahanzaib
Updated Every Monday

AI Agent News

What actually shipped in the AI agent world this week. Models, frameworks, benchmarks, production lessons. No hype, no LinkedIn theater. Written by an engineer who ships these systems for clients.

Latest: Week of April 20, 2026·3 weeks archived
This week

Week of April 20, 2026

Anthropic ships Sonnet 4.6 with 90% cheaper prompt caching. OpenAI quietly raises GPT-5 mini context to 256K. AWS Bedrock adds Knowledge Base agentic retrieval as GA.

Model

Sonnet 4.6 prompt caching now cuts cached tokens by 90%

Anthropic dropped cached input pricing on Sonnet 4.6 from 25% to 10% of base. For RAG agents that reuse system prompts and retrieved context across many tool-calling iterations, real-world spend drops 30 to 70%. The 5-minute TTL is unchanged. Worth turning on by default for any production agent that has not already.

Anthropic pricing docs
Model

GPT-5 mini context window quietly raised to 256K

OpenAI updated the gpt-5-mini context window to 256K tokens with no announcement, just a model card refresh. Same per-token pricing. Practical impact: classification and routing agents that needed to fall back to GPT-5.4 for long inputs can now stay on the cheaper model. Watch your evals though. Bigger context does not mean better attention to far-back tokens.

Framework

AWS Bedrock Knowledge Base agentic retrieval moves to GA

Bedrock's Knowledge Base now supports agentic retrieval as a generally-available feature: the LLM decides what to query and when, instead of one-shot retrieve-then-generate. Costs roughly 3 to 5x more per query because the model makes multiple retrievals, but answer quality on multi-hop questions improves materially. Toggle it via the retrievalConfiguration.vectorSearchConfiguration.overrideSearchType field.

AWS Bedrock changelog

Past Issues

All previous weekly roundups, newest first.

Week of April 13, 2026

OpenClaw hits 250K GitHub stars. New independent benchmark shows Sonnet 4.6 leads on tool-use reliability. N8n 2.5 ships native MCP server support.

FrameworkOpenClaw passes 250K GitHub stars

Peter Steinberger's open-source AI agent framework crossed 250,000 stars this week, making it the second-most-starred AI repo on GitHub after a major OpenAI release. The growth came from enterprise adoption. Cloudflare, Shopify, and several Fortune 500 finance teams shipped OpenClaw-based agents in Q1.

BenchmarkNew benchmark: Sonnet 4.6 wins on tool-use reliability

BenchLM published a 200-task benchmark focused specifically on tool-use reliability. Does the model pick the right tool, fill arguments correctly, and recover from tool errors. Claude Sonnet 4.6 scored 94.3% vs GPT-5.4 at 88.1% and Gemini 2.5 Pro at 82.7%. The gap was largest on multi-tool sequences (5+ tool calls), where Sonnet held quality and the others degraded sharply.

BenchLM tool-use benchmark
Toolingn8n 2.5 ships native MCP server support

n8n added a native MCP server node, letting any n8n workflow expose itself as an MCP tool to Claude, Cursor, or Claude Code. Previously this required custom HTTP wrapping. Big deal for anyone running n8n as their automation backbone. Your existing workflows are now agent-callable without code.

Week of April 6, 2026

Gartner doubles AI agent market forecast to $94B by 2028. Vapi launches agent voice testing harness. Two notable production failures published as post-mortems.

BusinessGartner revises AI agent market forecast upward to $94B by 2028

Gartner's revised forecast adds $30B to its prior estimate, citing accelerated enterprise pilot-to-production conversion (now 31% vs 14% a year ago). The growth is concentrated in 5 verticals: financial services, healthcare admin, customer support, legal, and field service. Voice agents specifically are forecast at $14B of the total. Biggest single segment.

Gartner article
ToolingVapi launches agent voice testing harness

Vapi released a synthetic-conversation testing harness that runs your voice agent through hundreds of scripted scenarios (angry customer, accent variations, background noise, transfer requests). Prices start at $99/month. Important because manual voice agent testing is the #1 reason production launches slip. Automated coverage solves it.

ResearchTwo production AI failures get public post-mortems

A major airline's customer support agent issued $40K in unauthorized vouchers over two days. Root cause was a prompt injection in a forwarded email triggering a "refund" tool call. A real estate brokerage's lead-qualification agent silently dropped 18% of leads because of a malformed tool schema that returned no error. Both teams published detailed post-mortems with the prompts, tool definitions, and fix. Required reading for anyone shipping prod agents.

Want this in your inbox?

Same weekly roundup, delivered Monday morning. No spam, just what shipped.