Budget agents just got cheaper: Gemini 2.5 Flash is our new pick

Google Gemini logoGoogle GeminiVerdict changedJune 6, 2026Verdict change
What happened
Gemini 2.5 Flash shipped native structured tool calling — the gap that kept it out of our budget verdict.
Why it matters
Cost-sensitive, tool-heavy agents can now run at roughly half the per-call cost with no reliability penalty.
What to do
If your agents run on GPT-4o mini for cost reasons, re-benchmark on Flash this week.

Google shipped native structured tool calling for Gemini 2.5 Flash, closing the one gap that kept it out of our budget-agents verdict.

We ran our standard reliability suite — 200 tool-calling tasks across retrieval, scheduling, and data extraction — and Flash completed them at parity with GPT-4o mini while costing roughly half per call.

If your agents are cost-sensitive and tool-heavy, the math now clearly favors Flash. For agents that need longer reasoning chains, our mid-tier picks are unchanged.

GPT-4o miniprevious pick
Gemini 2.5 Flashnew pick

Native tool use landed and held up in our reliability runs — at half the per-call cost.

Affected tools & models

Never need to catch up again

The weekly delta — only verdict changes and act-now items. No digest filler.