Y Combinator

Backed by Y Combinator

All issues
Inference Radar·2026-W14·Apr 2 — Apr 8, 2026·17 min read

Gemma 4 Ignites the KV-Cache Wars

This week’s open-source inference story wasn’t just “Gemma 4 landed.” It was that one model family forced nearly every serious runtime — cloud servers, local apps, Apple stacks, and edge frameworks — to confront the same bottlenecks at once: parser correctness, tool use, cache pressure, and hardware-specific fallbacks. The result is a clearer picture of where inference is heading: memory efficiency and deployment breadth now matter as much as raw tokens per second.

Cover for Gemma 4 Ignites the KV-Cache Wars
1,816 commits
1,330 PRs
996 issues
101 releases
67 active repos
Weekly activity by organization

Weekly briefing

Get the next issue in your inbox.

One email, every week. Every link cited. No fluff, no crypto analogies.

Subscribe on Inference Radar
RunAnywhere Logo

RunAnywhere

On-device AI inference research and infrastructure. Building the fastest engines for the hardware you already own.

© 2026 RunAnywhere, Inc.

Playground