Google shipped Gemma 4 into the AICore Developer Preview this month, and the headline numbers are the kind that actually move what's possible for indie developers: 4x faster than the previous generation, 60% less battery, native function-calling, structured JSON output, and a 2B variant optimized specifically for speed on-device.

The models run locally. No API key. No cloud round-trip. No latency spike when the user's subway signal drops.

What changed and why it matters#

Previous Gemini Nano iterations were useful but narrow — good for specific, well-bounded tasks, and clearly designed for Google's own system-level features rather than third-party apps. Gemma 4 is different. The two on-device variants (E2B for speed, E4B for reasoning depth) are being positioned as the foundation for Gemini Nano 4 across AICore-enabled devices. Code you write against Gemma 4 today will automatically inherit the performance optimizations when Nano 4 lands on user hardware later this year.

That forward-compatibility path is the part worth paying attention to. Google is building a stable API contract between developer code and on-device inference — which means the engineering investment you make today doesn't evaporate when the underlying model gets upgraded.

The practical unlock for indie apps#

Here's the thing most developers miss about on-device AI: the use cases aren't the flashy ones. They're the quiet UX wins that used to require a server call.

Smart autocomplete in a notes app that actually understands context — on-device. Classifying a photo journal entry locally so search works without uploading everything to your backend — on-device. A task management app that can surface overdue items based on what the user was trying to accomplish, not just what they typed — on-device. None of these need the reasoning depth of GPT-4. They need something fast, private, and always-available. That's exactly the Gemma E2B profile.

The battery improvement matters more than it sounds. Previous on-device model integrations often came with a performance tax that users noticed. A 60% battery reduction changes the math on how frequently you can invoke inference without degrading the overall app experience.

The new multimodal and multilingual baseline#

Gemma 4 supports over 140 languages natively. If you're building for markets outside English-speaking regions — and if you're not, you should be — that's a meaningful shift. Local-language AI features that previously required cloud inference (and the latency and cost that comes with it) can now run locally on the device.

The function-calling and structured JSON output support is the other practical unlock. It means you can treat the model as a reasoning layer that feeds clean data back into your app logic, rather than wrestling with free-text outputs and building your own parsing layer on top.

The submission side of the equation#

One thing that comes up when developers ship AI features: App Store and Google Play metadata catches up slowly. Your screenshots probably won't capture what the AI feature actually does. Your description probably uses vague language that doesn't tell users why the on-device aspect matters to their privacy.

This is a store listing problem as much as it's a product problem. If your app uses on-device AI and your listing doesn't say so explicitly — "all processing happens on your device, nothing is sent to a server" — you're leaving a meaningful conversion driver on the table. Stora's AI listing generator will pull that feature context into your description and metadata automatically, but even if you're writing it manually, make the on-device angle explicit. It converts.

The bottom line#

On-device AI on Android has had a "someday" feel for a few years now — interesting, technically credible, but not quite ready for products you'd ship to users. Gemma 4 in AICore moves the bar. The latency is low enough, the battery cost is low enough, and the API surface is stable enough that the engineering tradeoff now favors building on it rather than waiting.

Get the AICore Developer Preview. Build something small against the E2B variant. See what your app feels like when inference is instant and offline.

Gemma 4 is available now via the AICore Developer Preview. The E2B and E4B variants target on-device inference; larger variants (26B MoE, 31B Dense) target server-side workloads.

Google Just Put a State-of-the-Art AI Model on Your Users' Phones

What changed and why it matters#

The practical unlock for indie apps#

The new multimodal and multilingual baseline#

The submission side of the equation#

The bottom line#