Engineering brief

Google's Gemini 3.5 Flash: Hidden Costs and Broken Tooling

Theo - t3․gg

The Brief

Gemini 3.5 Flash looks fast in benchmarks but burns tokens at 2x the expected rate, making it deceptively expensive in production. At $9 per million output tokens after a tripled price, it's now the fourth most costly model to run. Worse, Google is force-migrating its open-source Gemini CLI to a closed-source rewrite with broken scrolling and state management—by June 18th. For teams building on GCP or integrating Gemini, this pattern of inflated costs and breaking changes is a serious reliability signal.

Decision relevance

Read this for workflow impact, implementation trade-offs, and the claims that need technical scrutiny before they reach team planning.

Summary

Google's Gemini 3.5 Flash launch looks strong on a benchmark chart but collapses under real-world scrutiny. The model performs well on agentic metrics like terminal bench and SWAG Bench, but that perceived speed is misleading—it generates excessive output tokens, making it 2x slower in practice and the fourth most expensive model to actually run. The per-token price has tripled again, reaching $9 per million output tokens for a model that burns resources without commensurate gains. For engineering teams, this creates a dangerous disconnect between marketing and cost-effective deployment: the model looks competitive in synthetic benchmarks but delivers broken, incomplete code in practical tests like the 'Fish Slap' game rewrite, where every other model succeeded.

The tooling situation is worse. Google is transitioning the open-source Gemini CLI—which had momentum, community trust, and 100k GitHub stars—to a closed-source, bug-ridden anti-gravity CLI written from scratch in Go. Basic functions like scrolling, copy-paste, and state management are broken. The forced migration on June 18th removes the open option entirely, mirroring a worrying pattern: Google’s internal politics have sidelined the original capable team in favor of an acquired group that is visibly copying competitors, to the point of leaving competitor folder names in demo videos. Teams that had integrated Gemini CLI into custom pipelines or relied on its open-source extensibility face a hard deadline and a broken replacement.

Perhaps more alarming for infrastructure reliability is the anecdote about Railway, a hosting service spending $2M/month on Google Cloud, having its entire account banned without warning, causing a full outage. This is not an isolated incident; the pattern includes a catastrophic account deletion for a $135B Australian pension fund two years ago. These failures highlight a systemic risk in Google Cloud's operational maturity compared to AWS or even Azure. For leaders responsible for service reliability, this raises serious questions about whether Google Cloud's opaque account management could become an existential threat to a business. The video’s core warning: Google’s corporate structure prevents its talented people from shipping trustworthy products, and betting critical infrastructure on Google’s stack is becoming increasingly indefensible.

Why It Matters

Google’s reliability and pricing for AI tools are diverging from reality, risking blown budgets and broken CI/CD pipelines for adopters.

Editorial analysis

Key claims

  • Gemini 3.5 Flash wastes tokens at high cost, and Google’s CLI strategy breaks community trust and tooling stability.

Practical use cases

  • Use this as input for tooling evaluation, workflow planning, and technical due diligence.

Risks / caveats

  • General Google corporate complaints not specific to AI/Cloud reliability and tooling decisions.

Who should care

  • Engineering managers, tech leads, and CTOs evaluating AI or developer tooling decisions.

Related topics

Bottom Line

Gemini 3.5 Flash wastes tokens at high cost, and Google’s CLI strategy breaks community trust and tooling stability.

Watch

This video is blocked due to your privacy settings. To watch this video, please accept YouTube marketing cookies.

Related breakdowns

Get TL;DW

Too Long; Didn't Watch.

A concise breakdowns of the AI and devtools videos that actually matter for engineering leaders.

Free. Weekly. No hype.

Video and thumbnails remain the property of their respective creators. tldw.news provides editorial analysis, commentary, and discovery links to original content.