Most Android developers who want to add AI features to their apps reach for a cloud API. It's the obvious path — you get a capable model, a clean SDK, and results that look impressive in a demo. Then you ship it, and reality arrives: API costs that scale with usage, latency that spikes on weak connections, privacy questions from users who don't love the idea of their data leaving the device, and a dependency on a third-party server your app can't control.
On-device AI sidesteps all of this. And in 2026, it's no longer a compromise. Google's Gemma 4 models — available now through the AICore Developer Preview — run inference locally on-device, with no API key, no cloud round-trip, and performance that holds up even when the user is offline.
This guide covers how it works, when to use it, and how to actually integrate it into a real Android app.
what AICore is and how it fits into the Android AI stack#
AICore is Google's on-device AI runtime for Android. It manages the lifecycle of on-device models — including Gemini Nano — and exposes them to third-party apps through a stable API. Think of it as the hardware abstraction layer between your app code and the neural accelerators in the device.
Before AICore, using on-device ML in Android meant either bundling your own model with the app (increasing APK size and maintenance burden) or using ML Kit for specific, pre-defined tasks. AICore introduces a third option: shared, OS-managed models that multiple apps can use without each one carrying the model weight.
This matters for indie developers in a specific way: you don't ship the model. Google ships it as part of the device's system software. Your app calls it. The inference happens locally. Your APK doesn't balloon by 500MB.
The models available through AICore are built on Gemma 4, the same architecture that powers Gemini Nano. Code you write against the AICore API today is forward-compatible — when Gemini Nano 4 ships on updated hardware later in 2026, your existing integration picks up the performance improvements automatically.
the two on-device model variants: E2B vs E4B#
Google released Gemma 4 in four sizes. For on-device use cases, two are relevant:
Gemma E2B (Effective 2 Billion) — optimized for maximum speed. Delivers roughly 3x faster inference than E4B. This is your model for latency-sensitive tasks: autocomplete, real-time suggestions, classifying user input on keypress, anything where 200ms feels slow.
Gemma E4B (Effective 4 Billion) — stronger reasoning depth. Takes longer to run but handles more complex instructions, multi-step logic, and nuanced language tasks. Better for summarization, longer-form text generation, or anything where quality matters more than instant feedback.
Both variants are dramatically more efficient than their Gemma 3 counterparts — roughly 4x faster overall, with 60% lower battery consumption. They also support native function-calling and structured JSON output, which makes them practical for apps that need to extract structured data from natural language rather than free-text generation.
setting up the AICore Developer Preview#
The AICore Developer Preview is available to developers now. Here's how to get started:
Step 1: Check device compatibility. AICore requires an Android device with a compatible neural accelerator. The Preview currently supports devices with AI hardware from Google (Tensor chips), MediaTek, and Qualcomm. Check the AICore Developer Preview documentation for the current device list.
Step 2: Add the AICore dependency. In your build.gradle:
dependencies {
implementation("com.google.android.aicore:aicore:1.0.0-alpha")
}Step 3: Check model availability at runtime. Not every device will have the model downloaded. Always check before inferring:
val aiCoreClient = AICoreClient.create(context)
val availability = aiCoreClient.checkAvailability()
when (availability) {
ModelAvailability.AVAILABLE -> runInference()
ModelAvailability.DOWNLOADING -> showLoadingState()
ModelAvailability.NOT_SUPPORTED -> fallbackToCloudOrHideFeature()
}This check is important. On-device AI features should degrade gracefully on unsupported hardware — either falling back to a cloud API or hiding the feature entirely. Don't assume all users will have the model available.
Step 4: Run your first inference. A basic text generation call:
val session = aiCoreClient.createSession(
ModelConfig(
model = GemmaModel.E2B,
temperature = 0.7f,
maxOutputTokens = 256
)
)
val response = session.generateContent(
prompt = "Summarize this in one sentence: $userText"
)
println(response.text)practical use cases that actually make sense on-device#
The best on-device AI features share a few traits: they're latency-sensitive, benefit from privacy guarantees, or need to work offline. Here are patterns that fit naturally.
Smart autocomplete and content suggestions. Notes apps, writing tools, email composers — anywhere the user is typing and would benefit from contextual suggestions. On-device means suggestions appear immediately, without waiting for a network round-trip, and without user text leaving the device.
Local content classification and tagging. If your app stores user-generated content (journal entries, photos with captions, expense notes), on-device inference can automatically classify and tag content without a backend. Search becomes smarter without ever uploading user data.
Adaptive summaries and digests. News aggregators, read-later apps, productivity tools — generating a brief summary of a long piece of content is a good E4B use case. The model runs once on save, stores the summary, and the user benefits every time they return to the item.
Context-aware input validation. Instead of regex-based form validation, use the model to catch ambiguous or likely-incorrect inputs ("did you mean a date in 2026, not 2016?"). Runs in milliseconds with E2B.
handling multilingual users#
Gemma 4 natively supports over 140 languages. If you're building for markets where English isn't the primary language — Latin America, Southeast Asia, South Asia, the Middle East — this is a significant practical unlock.
Previous on-device models had meaningful English language bias. Features that worked well for English-speaking users often degraded significantly for other locales, which pushed multilingual use cases toward cloud APIs. Gemma 4's multilingual support is trained-in rather than translated-after, which means the quality difference is smaller.
If your app already supports multiple locales in its UI, check whether your on-device AI features are providing comparable quality across those locales before shipping. The AICore API passes the model your prompt as-is — if you send Spanish text, you'll get a Spanish response.
structured output and function-calling#
One of the more underrated Gemma 4 capabilities is native structured JSON output. This means you can ask the model to return data in a schema you define, rather than parsing free text. Practical example — extracting structured data from a freeform user note:
val prompt = """
Extract the following fields from this expense note and return as JSON:
- amount (number)
- currency (string, 3-letter ISO code)
- description (string)
- date (string, YYYY-MM-DD format)
Note: "$userNote"
Return only valid JSON, no explanation.
""".trimIndent()
val response = session.generateContent(
prompt = prompt,
outputFormat = OutputFormat.JSON
)
val expense = Json.decodeFromString<Expense>(response.text)This pattern — natural language in, structured data out — is what makes on-device AI genuinely useful for apps that store and process user data rather than just displaying generated text.
the App Store and Play Store listing angle#
Once you've shipped an on-device AI feature, your store listing needs to communicate it clearly. "Powered by on-device AI" is a meaningful differentiator for users who care about privacy — and that population is growing, particularly in markets where data sovereignty is a concern.
The specific language that converts: "All AI processing happens on your device — your data never leaves your phone." This is factually accurate for AICore-based features and directly addresses the privacy anxiety that cloud AI features create. Put it in your description. Consider adding a dedicated screenshot that communicates the privacy benefit.
what to watch for: AICore limitations in the preview#
The AICore Developer Preview is not production-ready for all use cases. A few things to keep in mind:
Device availability is still limited. The preview supports a subset of Android devices with compatible hardware. Until Gemini Nano 4 ships broadly, you can't assume AICore availability for a significant portion of your user base. Always implement graceful fallback.
Model download is not guaranteed. Even on a supported device, the model may not be downloaded yet. Budget for a loading state and a fallback path. Don't block the user's core workflow on model availability.
Context window is constrained. On-device models run with a smaller context window than cloud alternatives. For tasks involving long documents, you'll need to chunk input or summarize progressively.
Preview API may change. The AICore API is in developer preview — interfaces can change before general availability. Pin your dependency version and review the changelog when updating.
the road ahead#
Google has been clear that code written for Gemma 4 in AICore today will inherit the Gemini Nano 4 optimizations when they ship on consumer hardware. That forward-compatibility path is unusual — it means the investment you make in on-device AI now doesn't deprecate when the hardware gets better. Your app gets faster for free.
Android 16 is also shipping new APIs that assume on-device AI as a first-class component — the Writing Tools API, new RangingManager capabilities for device-to-device AI coordination, and system-level AI features that third-party apps can integrate with. Building on AICore now positions your app to participate in these system integrations when Android 16 reaches broad distribution.
On-device AI on Android is no longer a research preview or a power-user feature. It's a production capability with a stable API, real hardware support, and a clear roadmap. The developers who integrate it now will have a meaningful head start when the hardware becomes ubiquitous.
Start with a small, well-defined use case. Get the AICore preview set up. Ship something that makes your app genuinely better for the users who have the hardware — and fails gracefully for the ones who don't. Then you'll be ready when the hardware is everywhere.
AICore Developer Preview documentation: developers.google.com/ml-kit/genai/aicore-dev-preview. Gemma 4 model details: android-developers.googleblog.com