Mistral AI recently released Ministral 3B and Ministral 8B, two small language models that are collectively called les Ministraux. The models are designed for local inference applications and outperform other comparably sized models on a range of LLM benchmarks.
Both models have a base and an instruct version, and both a 128k context length. Ministral 8B uses interleaved sliding-window attention, which provides faster and more efficient inference. The release comes just over one year after Mistral announced their first model, Mistral 7B. Unlike that first model, which was released under the Apache 2.0 license for unrestricted use, les Ministraux models require a commercial license; the 8B model is also available for research purposes. The models are also available through Mistral AI's API. According to Mistral AI:
Our most innovative customers and partners have increasingly been asking for local, privacy-first inference for critical applications such as on-device translation, internet-less smart assistants, local analytics, and autonomous robotics. Les Ministraux were built to provide a compute-efficient and low-latency solution for these scenarios...Used in conjunction with larger language models such as Mistral Large, les Ministraux are also efficient intermediaries for function-calling in multi-step agentic workflows. They can be tuned to handle input parsing, task routing, and calling APIs based on user intent across multiple contexts at extremely low latency and cost.
Since their original Mistral 7B release, Mistral AI has developed several other specialized models, most of which have the Apache 2.0 license. Earlier this year, InfoQ covered Mixtral 8x7B, a sparse mixture of experts (SMoE) LLM with performance rivaling larger models such as Llama 2 70B and GPT-3.5. InfoQ also covered Codestral, its first code-focused AI model, and three open-weight models: Mistral NeMo, a 12B parameter general-purpose LLM; Codestral Mamba, a 7B parameter code-generation model; and Mathstral a 7B parameter model fine-tuned for math and reasoning.
Mistral AI reports that le Ministraux models "consistently outperform their peers." The report metrics for benchmarks such as MMLU, Winogrande, and GSM8k, which show Ministral 3B outperforming Llama 3.2 3B and Gemma 2 2B, and Ministral 8B outperforming Llama 3.1 8B and Mistral AI's own Mistral 7B. The independent evaluation benchmarking site Artificial Analysis has evaluated the models (both 3B and 8B version) on a different set of benchmarks; overall the models compare favorably to Llama and Gemma, especially on the HumanEval coding benchmark. They also show that les Ministraux have much faster inference speed.
Ministraux Performance Comparison. Image Source: Mistral AI Blog
In a discussion about les Ministraux on Hacker News, several users lamented the models' requirement of a commercial license for locally-hosted use. Some users pointed out that the models are available via an API, and one noted that "In Europe, they are basically the only LLM API provider that is GDPR compliant." Other users questioned Mistral AI's ability to compete with larger model creators such as Meta. Lee Harris, head of R&D at Rev.AI, wrote:
I think for Mistral to compete with Meta they need a better API. The on-prem/self-hosted people will always choose the best models for themselves and you won't be able to monetize them in a FOSS world anyways, so you just need the better platform. Right now, Meta isn't providing a top-tier platform, but that may eventually change.
The model weights Ministral 8B Instruct can be downloaded from Huggingface for research purposes.