BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Open Sources 27B Parameter Gemma 2 Language Model

Google Open Sources 27B Parameter Gemma 2 Language Model

Google DeepMind recently open-sourced Gemma 2, the next generation of their family of small language models. Google made several improvements to the Gemma architecture and used knowledge distillation to give the models state-of-the-art performance: Gemma 2 outperforms other models of comparable size and is competitive with models 2x larger.

Gemma 2 improves on the first generation Gemma architecture by incorporating ideas from Google's flagship model Gemini, including a Grouped-Query Attention (GQA) mechanism and a mix of global attention and local sliding window attention. Google trained three sizes of Gemma 2: with two billion, nine billion, and 27 billion parameters respectively. The two smaller models were trained using knowledge distillation, with a larger language model used as a teacher. When evaluated on LLM benchmarks such as MMLU, GSM8K, and Winogrande, the 27B parameter Gemma 2 model outperformed the baseline Qwen1.5 32B model, and was "only a few percent below" the much larger 70B parameter Llama 3. According to Google, 

We show that distillation is an effective method for training these models, and the benefits distillation confers over raw text training. Specifically, we show how training over output probabilities can produce superior results over purely next token prediction. We hope that releasing these models to the community will unlock access to capabilities previously only seen in large-scale LLMs and fuel future waves of research and development.

The Gemma 2 release continues the industry trend of small, openly-available language model families, such as Microsoft's Phi and Meta's Llama. These models have incorporated architecture improvements like GQA as well as high-quality training data to achieve better performance than would be expected for a small model.

Besides evaluating Gemma 2 against common benchmarks, Google also submitted instruction-tuned versions of the 27B and the 9B model to the Chatbot Arena, where models are pitted against each other in "blind side by side evaluations" by human judges. Gemma 2 27B is currently the highest ranked open model, edging out Llama 3 70B. The 9B version is also doing well, and according to Google, "strongly outperforms all other models in the same range of parameters."

AI researcher Sebastian Raschka commented on Google's Gemma 2 research paper in a thread on X. Raschka highlighted several noteworthy features, but also said, "It would be interesting to see a comparison with the more recent Qwen 2 model." In a discussion about Gemma 2 on Hacker News, several praised the model's performance. One noted:

It's multilingual. Genuinely. Compared my results with some people on reddit and the consensus is that the 27B is near perfect in a few obscure languages and likely perfect in most common ones. The 9B is not as good but it's still coherent enough to use in a pinch. It's literally the first omni-translation tool that actually works that you can run offline at home. I'm amazed that Google mentioned absolutely nothing about this in their paper.

Users can access Gemma 2 models over the web via Google's AI Studio or in Google Cloud Platform's Vertex AI. The 9B and 27B Gemma 2 models are available for download from Huggingface and Kaggle, and Google claims the 2B model will be available soon. The models are released under a "commercially-friendly" Apache 2.0 license. Google also published a cookbook with "guides and examples" for using Gemma 2.

About the Author

Rate this Article

Adoption
Style

BT