The Allen Institute for AI research team has introduced OLMo 2, a new family of open-source language models available in 7 billion (7B) and 13 billion (13B) parameter configurations. Trained on up to 5 trillion tokens, these models redefine training stability, adopting staged training processes, and incorporating diverse datasets.
OLMo 2's architecture leverages improvements in layer normalization, employing RMSNorm, and rotary positional embeddings, as well as Z-loss regularization to enhance model robustness. The training process utilized a two-stage curriculum approach, with the first stage focusing on the OLMo-Mix-1124 dataset, comprising 3.9 trillion tokens from high-quality repositories like DCLM and Starcoder. The second stage involved fine-tuning Dolmino-Mix-1124, a curated dataset of 843 billion tokens featuring web-based and domain-specific content.
Techniques like model souping, which merges checkpoints to optimize performance, were crucial in achieving the final versions of the 7B and 13B models. The performance of OLMo 2 sets new benchmarks in open-source language modeling, demonstrating a significant boost across all evaluation tasks compared to its predecessor, OLMo-0424.
Notably, OLMo 2 7B outperforms Llama-3.1 8B, and OLMo 2 13B surpasses Qwen 2.5 7B, despite utilizing fewer training FLOPs. Evaluation using the Open Language Modeling Evaluation System (OLMES), a suite of 20 benchmarks, confirmed these gains, highlighting strengths in knowledge recall, reasoning, and general language capabilities.
The development of OLMo 2 marks a significant shift in the language modeling landscape, addressing challenges such as training stability and evaluation transparency. By setting a new standard for open-source AI, these models demonstrate the potential of collaborative innovation in advancing artificial intelligence, paving the way for more equitable technological advancements.
The AI community has responded enthusiastically to OLMo 2’s launch, recognizing Ai2 for its commitment to open-source.
AI Researcher Constantine Dee commented on X:
Ai2 has unveiled OLMo 2, the world's leading open-source AI model. Built with transparent datasets and training, it's a game-changer for creating diverse content.
While user Billy462 shared on Reddit:
This release is extremely significant. For those that don't know Allen AI are a research institute who are releasing completely open models. That means that all of their results can be reproduced (and improved upon) from scratch.
The OLMo 2 models are available, along with their weights, data, code, recipes, and intermediate checkpoints. The introduction of OLMES provides structured benchmarks to guide model development and track progress effectively. Additionally, post-training methodologies, including supervised fine-tuning, preference tuning, and reinforcement learning with verifiable rewards, have enhanced the models' instruction-following capabilities.