Google Cuts Gemini Prices and Shifts the AI Race to Unit Cost

Ishan Crawford 2 months ago 0 25

Google walked into Shoreline Amphitheatre on Tuesday with a $100 subscription tier, a $50 monthly haircut on its top plan, and a claim that enterprises shifting workloads to Gemini could save up to $1 billion a year. The headline number drew applause. The pricing sheet is the actual policy.

For two years, AI launches were graded on model benchmarks. The grading rubric flipped this week. The new Gemini AI Ultra tier at $100 a month, the reduction of the higher-end plan from $250 to $200, and the per-token sheet behind Gemini 3.5 Flash together tell the market that Google has decided the next phase of the race is fought on cost per useful token, not on leaderboard wins.

The Hundred-Dollar Ultra Tier and the $50 Cut

At its annual developer conference, Google introduced a fresh AI Ultra plan priced at $99.99 a month and lowered its existing top-end subscription to $199.99, a step down from the $249.99 it carried since launch. The cheaper tier ships with five times the Gemini app and Antigravity usage limits of the Pro plan, 20 terabytes of cloud storage, and a YouTube Premium individual membership bundled in. The reduced flagship plan keeps its 20x limit ceiling and adds Spark and Project Genie access.

Sundar Pichai, Google’s chief executive, framed the moves as part of an expansion that now reaches 900 million monthly Gemini users across 230 countries, with AI Overviews in Search serving roughly 2.5 billion users globally. Those figures land roughly twelve months after Gemini was sitting closer to 400 million monthly users, a doubling that gives Google a usage base wide enough to justify cutting unit prices.

The table below summarises where the consumer-developer stack now sits versus the prior month.

Plan	Old price	New price	Usage ceiling vs Pro
AI Ultra (new mid-tier)	did not exist	$99.99 / month	5x
AI Ultra (flagship)	$249.99 / month	$199.99 / month	20x
AI Pro	$19.99 / month	$19.99 / month	1x baseline

The middle rung is the interesting one. It hands enterprise testers and serious solo developers a five-fold usage envelope at less than half of last week’s flagship price. That gap is large enough to pull workloads down a tier rather than push them up.

Google Gemini pricing cuts at I/O 2026 reshape enterprise AI affordability and adoption.

Why Token Math Is Now Strategy

For most of the last two years, the model business looked like a capability scoreboard. Pricing was a footnote. What changed is that the marginal Gemini call now runs through Google’s own silicon stack, where the seventh-generation Tensor Processing Unit (TPU, Google’s custom AI chip family) handles inference at a unit cost the company controls end to end. Rivals running on rented GPU capacity from Microsoft, Amazon, or CoreWeave are exposed to a different cost curve.

That asymmetry is starting to show up in pricing decks rather than press releases.

900 million monthly Gemini users now sit inside Google’s own apps, giving the company a fixed cost base it can spread across.
2.5 billion monthly users see AI Overviews inside Search, a workload that has to be served profitably whether the user pays or not.
$1 billion is the annual savings Google is telling large enterprise buyers they can extract by moving heavy workflows to Gemini and Vertex AI.
Roughly 80% is the rough industry-wide drop in headline large-language-model API prices over the past twelve months, according to comparison data tracked across Anthropic, OpenAI, and Google.

A pricing analyst who covers the hyperscalers put the shift bluntly during the keynote week. Cost per useful token has become the new floor metric, and the providers that own their inference stack will set the price the rest of the market has to match.

Inside Gemini 3.5 Flash’s Price Sheet

The consumer tiers are the storefront. The per-token sheet behind the Gemini 3.5 model family is what enterprise buyers actually run their math against. Three numbers matter most.

Input and Output Token Pricing

Gemini 3.5 Flash, the new agent-tuned model the company shipped on May 19, is priced at $1.50 per million input tokens and $9.00 per million output tokens, according to the published Gemini developer API pricing page. That is roughly triple the rate Google charged for the prior Flash model, a detail Google did not lead with on stage.

The justification for the increase is that the new Flash scores within two points of Anthropic’s Claude Opus 4.7 on common reasoning benchmarks while costing roughly a third as much per token. A jump on Flash absolute pricing reads, in that frame, as a reclassification: this is no longer a small-and-cheap model, it is a frontier model wearing a midweight badge.

The Cached-Input Discount

Cached input is billed at $0.15 per million tokens, a 90% discount on the standard input rate. For agentic workloads that repeatedly fetch the same context window across a long task, that line item is the one that drags effective unit cost down toward where the old Flash sat. Enterprises that engineer their prompts for cache reuse will pay closer to the headline saving Google advertised on stage.

The Agent Signal in the Pricing

The Flash output rate of $9 per million tokens is the giveaway that Google expects agent workloads, not chat sessions, to drive the next leg of usage. Agents talk far more than they listen. Pricing the output side at six times the input side puts the meter where the multi-step reasoning is, which is where Google wants developers building Antigravity workflows and Vertex AI agents to land their business case.

OpenAI and Anthropic Are Forced to the Same Question

Google’s cuts do not happen in a vacuum. Anthropic’s Claude Opus 4.7 currently sits at $5 per million input tokens and $25 on output. OpenAI’s GPT-5.4 lists at $2.50 input and $15 output. Both are now visibly more expensive than Gemini 3.5 Flash at the frontier band, even before Google’s caching discount is applied.

Model	Input ($/M tokens)	Output ($/M tokens)	Cached input
Gemini 3.5 Flash	$1.50	$9.00	$0.15
OpenAI GPT-5.4	$2.50	$15.00	90% off (cached)
Anthropic Claude Opus 4.7	$5.00	$25.00	90% off (cached)
Anthropic Claude Sonnet 4.6	$3.00	$15.00	90% off (cached)

The pressure now points two ways. OpenAI, which signed a multi-year capacity deal with Microsoft Azure, pays for the inference infrastructure it runs on. Anthropic splits between Amazon Web Services and Google Cloud and pays accordingly. Neither has the option to slash sticker prices without bleeding margin. Google can afford a price war because it owns the meter.

Meta has chosen a different exit. Last week the company restructured 15,000 employees, telling roughly 8,000 staff to leave and moving 7,000 more into four newly created AI organisations. That reshuffle, covered in detail in our coverage of Meta’s four new AI pods and the 8,000 layoffs that funded them, is the labour-side version of the same equation Google answered with pricing.

The Billion-Dollar Enterprise Pitch

The $1 billion annual savings number was the loudest claim of the keynote and the one most likely to be tested in procurement meetings over the next quarter. Google’s own examples cited enterprise buyers running customer-service automation, software-development assistance, and back-office workflow rewrites on Gemini and Vertex AI.

The pitch translates into three buyer-side propositions enterprises will weigh through the back half of the year.

Workflow consolidation: replacing several point AI tools with a single Gemini-plus-Vertex stack so a Fortune 500 buyer pays one vendor instead of four, with predictable token billing.
Engineering productivity: shipping Gemini 3.5 Flash through GitHub Copilot, where it is now generally available, to extend AI-assisted coding from senior engineers to broader teams.
Customer-service throughput: pricing agent runs at a unit rate that lets contact-centre operators model genuine cost-per-call savings against an outsourced human floor.

Whether the savings hit a billion is a function of how aggressively a buyer engineers around cached inputs and which workloads they retire entirely. The number is plausible for a hyperscale buyer running global support. It is a brochure number for most others.

What the Middle East and Asia Stand to Gain

For markets where AI procurement budgets sit closer to government desks than to private CIOs, the lower price stack matters more. Sovereign cloud projects in the United Arab Emirates and Saudi Arabia, telecom modernisation tenders across Southeast Asia, and smart-city programmes in India all run against fixed multi-year budgets where every dollar of token cost compounds.

Banking regulators in the Gulf have spent the last year writing AI-governance frameworks that assume model-vendor pricing keeps falling. Google’s I/O move is the first signal that the assumption holds for at least another twelve months. Telcos, public-sector cloud buyers, and government services operators were already piloting agentic systems on Vertex AI and now have a fresh case for moving from pilot to production.

The same logic applies to consumer-facing product launches across Google’s portfolio. The smart-glasses partnerships unveiled at the same conference, covered in our piece on Google and Samsung’s Android XR eyewear bet, will sit on the same inference stack and inherit the same cost curve.

The Race Has a New Finish Line

For most of 2024 and 2025, the question every AI vendor faced was whether their next model would clear the benchmark a competitor set last month. The question that landed at Shoreline this week is different. It is whether the company can sell that benchmark at a price the buyer can defend to a finance committee.

Google chose its answer in public. If OpenAI and Anthropic match the Gemini per-token sheet inside the next two earnings cycles, the floor on frontier AI pricing settles roughly where Google put it this week and the entire industry runs on thinner gross margins for a year or more. If they refuse to follow, Google walks away with the workloads that get priced by procurement rather than by demo.