Meta recently unveiled its latest language model, Llama 3.1 405B. This AI model is the largest of the new Llama models, which also include 8B and 70B versions. With 405 billion parameters, 15 trillion tokens, and 16,000 GPUs, Llama 3.1 405B offers a range of impressive features.
"We believe there are three key levers in the development of high-quality foundation models: data, scale, and managing complexity. We seek to optimize for these three levers in our development process. These improvements include the development of more careful pre-processing and curation pipelines for pre-training data and the development of more rigorous quality assurance and filtering approaches for post-training data." - Meta AI
After the announcement, several cloud vendors announced their support for running Llama 3.1 405B. 405B was released with providers including Databricks, Dell, Nvidia, IBM, Snowflake, Scale AI, and more. "Amazon Bedrock offers a turnkey way to build generative AI applications with Llama," Amazon wrote. "Microsoft is announcing Llama 3.1 405B available today through Azure AI’s Models-as-a-Service as a serverless API endpoint," Microsoft announced. "We’re excited to be one of Meta’s launch partners to make their newest Llama 3.1 8B model available", Cloudflare said. Groq mentioned early API access to Llama 3.1 405B is currently available to select customers only.
The open-source models have a context window of 128k tokens, meaning users can enter hundreds of pages of content in their prompts. They are multilingual, with support for eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The models also come with tools for web search, math reasoning, and code execution.
"Compared to prior versions of Llama (Touvron et al., 2023a,b), we improved both the quantity and quality of the data we use for pre-training and post-training. These improvements include the development of more careful pre-processing and curation pipelines for pre-training data and the development of more rigorous quality assurance and filtering approaches for post-training data. We pre-train Llama 3 on a corpus of about 15T multilingual tokens, compared to 1.8T tokens for Llama 2," Meta wrote.
One of the most significant aspects of the Llama 3.1 models is that they are open source. Users can download the weights and use them in their applications. Its benchmark scores are close to, and sometimes even surpass, those of GPT-4o and Claude 3.5 Sonnet. Results can be seen in the model card.
According to the Scale AI’s SEAL leaderboard, Llama 3.1 405B ranks second in math and reasoning, fourth in coding, and first in following instructions. The exact performance will depend on the use case, but it is expected to be on par with the top closed LLMs.
Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. - Mark Zuckerberg
The release of Llama 3.1 405B is potentially the first time anyone can download a GPT-4-class large language model for free and run it on their own hardware. However, users will still need powerful hardware as Meta says it can run on a "single server node," which is beyond the capabilities of a desktop PC. The release of Llama 3.1 405B is not just a technical achievement but also a strategic move in the AI industry.
It is worth noting that these models are not multimodal and do not understand or create images. Meta has promised that multimodal Llamas are on the way. Developers interested in learning more about the model can find it on the HuggingFace Hub or read the technical paper.