BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Launches Gemma 3 1B For Mobile and Web Apps

Google Launches Gemma 3 1B For Mobile and Web Apps

Log in to listen to this article

Requiring a "mere" 529MB, Gemma 3 1B is a small language model (SLM) specifically meant for distribution across mobile and Web apps, where models must download quickly and be responsive to keep user engagement high.

Thanks to its reduced footprint, Gemma 3 1B can be downloaded and works locally even when WiFi or cellular connection is not available. Running on-device, the model offers minimal latency and does not incur Cloud costs. More importantly, user data is kept private since it does not need to leave the device.

The main use case for adopting Gemma 3 1B in integrating a natural language interface in your app:

Including Gemma 3 1B in your app, you can use natural language to drive your application or generate content from in-app data or context, all fully customizable and fine-tunable.

This includes generating descriptions and captions for data, supporting conversation, ingesting long documents to answer user questions using the AI Edge RAG SDK, creating dialog based on current app state, and more.

Gemma 3 1B can be fine-tuned through a variety of methods, including using a synthetic reasoning dataset, LoRA adaptors, and more. Google is providing a ready-to-use Colab notebook showing how to combine the two mentioned methods and then convert the resulting model to the LiteRT format, which is the new name for the TensorFlow Lite format.

To make it easier for developers to integrate Gemma 3, Google also provided a sample chat app for Android showing how to use the model for text generation, information retrieval and summarization, email drafting, and more. The app uses the MediaPipe LLM Inference API, although the model can be integrated also using the LiteRT stack directly.

The sample app using Gemma 3 is not yet available for iOS, for which Google is only providing an outdated sample app using Gemma 2 as the MediaPipe LLM Inference API for iOS does not yet support the new model.

Google provided performance figures showing that Gemma 3 1B significantly outperforms Gemma 2 2B while requiring only 20% of the deployment size. As Google engineers explain, these improvements were achieved through extensive optimizations using quantization-aware training, improving the KV Cache performance, reducing loading time thanks to optimized weight layouts, and sharing weights across the prefill and decode phases.

While these optimizations apply to all open-weight models, not only Gemma, the final results may vary greatly with the device used to run the model and its runtime configuration.

Gemma 3 1B can run on either the CPU or the GPU of a mobile device with at least 4GB of memory for best performance. The model is available for download from HuggingFace under Google's usage license.

About the Author

BT