InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Meta Releases Llama 3.2 with Vision, Voice, and Open Customizable Models

AI, ML & Data Engineering

Meta Releases Llama 3.2 with Vision, Voice, and Open Customizable Models

Oct 07, 2024 2 min read

Write & Win: InfoQ Contest

Join the contest to:

Win a conference ticket
Boost your profile
Help the community

Send your article proposal

Meta recently announced Llama 3.2, the latest version of Meta's open-source language model, which includes vision, voice, and open customizable models. This is the first multimodal version of the model, which will allow users to interact with visual data in ways like identifying objects in photos or editing images with natural language commands, among other use cases.

The new release includes vision models with 11 billion and 90 billion parameters, as well as lightweight text-only models with 1 billion and 3 billion parameters designed to run efficiently on edge and mobile devices. Llama 3.2 models support an extended context length of up to 128K tokens, positioning them as state-of-the-art in their class for tasks such as summarization, instruction following, and text rewriting.

Works great on documents, OCR, complex graphs.. I asked the 11B model what was funny about this image-it was able to pick the humour and even the paper details! - Sanyam Bhutani

This release is part of Meta’s ongoing commitment to openness, offering both pre-trained and instruction-tuned versions that developers can fine-tune for custom applications using tools like torchtune and torchchat. The models are available for immediate download on platforms like Hugging Face and Meta's own website, and they can be deployed across a broad ecosystem of partner platforms, including major cloud providers like AWS, Google Cloud, and Microsoft Azure.

The vision models, which are the first in the Llama series to support image reasoning, can handle complex tasks such as document-level understanding, image captioning, and visual grounding. The lightweight 1B and 3B models are particularly noteworthy for their ability to run on mobile devices, offering instant responses and enhanced privacy by processing data locally. These models are also capable of tool calling, making them ideal for personalized, on-device applications.

Meta today launched the Llama 3.2 family of models and I really like the new tiny 3b model. You can run it locally on your laptop, it's fast and pretty good - Guido Appenzeller

The training process for these models involved multiple stages, starting from pre-trained Llama 3.1 text models and incorporating image adapters and encoders. Post-training involved several rounds of alignment, including supervised fine-tuning and rejection sampling, to ensure the models were both helpful and safe. Meta also employed synthetic data generation to enhance the quality of fine-tuning data.

“It’s sort of like the Linux of AI, and we’re seeing closed-source labs react by trying to slash their prices to compete with Llama,” Mark Zuckerberg, the CEO of Meta, said. The new model will not be available in the EU due to legal reasons.

Meta has introduced Llama Stack distributions to simplify the deployment of these models in various environments, from single-node setups to cloud and on-device applications. This includes a command line interface, client code in multiple languages, and Docker containers, providing a consistent and streamlined experience for developers. The stack supports both local and cloud-based implementations, allowing flexibility in choosing between running models locally or utilizing cloud services. Developers can install the stack via PyPI and configure it using a series of interactive commands, with support for both Conda environments and Docker images.

Safety remains a priority, with new updates to the family of safeguards, including Llama Guard 3 for vision capabilities and optimized versions for lightweight models. These safeguards are integrated into reference implementations and are available for the open-source community to use.

Developers interested in learning more about Llama 3.2 may find more information on Github, such as information about model evaluations and model cards for the text and vision models.

About the Author

Andrew Hoblitzell

Andrew Hoblitzell is a senior member of technical staff at Salesforce, where he works on the Einstein team. He holds a Ph.D. in Computer Science from Purdue University, West Lafayette and is passionate about applications of machine learning and learning from and educating others in the ML community.

Show moreShow less

Write Your Way to a QCon or InfoQ Dev Summit!

Join the InfoQ article competition to win a complimentary ticket to QCon or InfoQ Dev Summit! We're seeking in-depth technical articles written by software developers for software developers.

Send your proposal

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Meta Releases Llama 3.2 with Vision, Voice, and Open Customizable Models

Write & Win: InfoQ Contest

About the Author

Andrew Hoblitzell

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Write Your Way to a QCon or InfoQ Dev Summit!

The InfoQ Newsletter