Google Released Gemma 4 with a Focus on Local-First, On-Device AI Inference

With the release of Gemma 4, Google aims to enable local, agentic AI for Android development through a family of models designed to support the entire software lifecycle, from coding to production.

Gemma 4 models cover a spectrum of capabilities, from efficient on-device variants that power Android apps via the ML Kit GenAI Prompt, to more powerful models designed to deliver AI-powered coding assistance in Android Studio on desktop.

Gemma 4 includes three models: Gemma E2B, which requires 8GB of RAM and 2GB of storage; Gemma E4B, which requires 12GB of RAM and 4GB of storage; and Gemma 26B MoE, which requires 24GB of RAM and 17 of storage. The most powerful model is recommended for use as a coding agent on a development machine, while the two smaller variants are suitable for on-device integration.

Gemma 26B MoE enables local, agentic coding without requiring code to be shared with cloud-based AI providers, making it especially valuable for developers working under strict data privacy requirements or in secure enterprise environments. According to Google, it runs efficiently on modern hardware by leveraging local GPU and RAM resources. Additionally, its usage is not constrained by token quotas or network latency. Gemma 26B MoE can be used to design new features or an entire app, refactor existing code, and resolve build/lint errors, Google says.

The two smaller models, Gemma E2B and Gemma E4B, are designed for on-device inference. Specifically, E4B offer stronger reasoning power and is better suited for complex tasks, while E2B is optimized for maximum speed, delivering 3x faster inference than Gemma E4B, along with lower latency.

Google says the new models are up to 4x faster than previous versions and use up to 60% less battery. In addition, they deliver higher-quality results for chain-of-thought prompts and conditional reasoning, with better math skills, temporal reasoning, and image processing, for use cases such as chart interpretation, visual data extraction, and handwriting recognition.

Gemma 4 provides the foundation for the next generation of Gemini Nano, which powers AI features on Android devices. Developers can already use it to prototype their apps and prepare them for Gemini Nano 4, which is expected to become available on supported devices later this year. To access Gemma 4 models on Android devices, developers can join the AICore Developer Preview program.

The following is a code snippet showing how to use the models in Kotlin:

// Define the configuration with a specific track and preference
val previewFullConfig = generationConfig {
    modelConfig = ModelConfig {
        releaseTrack = ModelReleaseTrack.PREVIEW
        preference = ModelPreference.FULL
    }
}

// Initialize the GenerativeModel with the configuration
val previewModel = GenerativeModel.getClient(previewFullConfig)

// Verify that the specific preview model is available
val previewModelStatus = previewModel.checkStatus()
if (previewModelStatus == FeatureStatus.AVAILABLE) {
    // Proceed with inference
    val response = previewModel.generateContent("If I get 26 paychecks per year, how much I should contribute each paycheck to reach my savings goal of $10k over the course of a year? Return only the amount.")

} else {
    // Handle the case where the preview model is not available
    // (e.g., print out log statements)
}

Gemma 4 models can also be installed from Ollama or LM Studio.

About the Author

Sergio De Simone

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Sergio De Simone

Rate this Article

This content is in the Mobile topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter