Budget agents just got cheaper: Gemini 2.5 Flash is our new pick
- What happened
- Gemini 2.5 Flash shipped native structured tool calling — the gap that kept it out of our budget verdict.
- Why it matters
- Cost-sensitive, tool-heavy agents can now run at roughly half the per-call cost with no reliability penalty.
- What to do
- If your agents run on GPT-4o mini for cost reasons, re-benchmark on Flash this week.
Google shipped native structured tool calling for Gemini 2.5 Flash, closing the one gap that kept it out of our budget-agents verdict.
We ran our standard reliability suite — 200 tool-calling tasks across retrieval, scheduling, and data extraction — and Flash completed them at parity with GPT-4o mini while costing roughly half per call.
If your agents are cost-sensitive and tool-heavy, the math now clearly favors Flash. For agents that need longer reasoning chains, our mid-tier picks are unchanged.
GPT-4o miniprevious pick
Gemini 2.5 Flashnew pick
Native tool use landed and held up in our reliability runs — at half the per-call cost.
Affected tools & models
Never need to catch up again
The weekly delta — only verdict changes and act-now items. No digest filler.