InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Meta Open-Sources Byte Latent Transformer LLM with Improved Scalability

AI, ML & Data Engineering

Meta Open-Sources Byte Latent Transformer LLM with Improved Scalability

This item in japanese

Jan 07, 2025 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Meta open-sourced Byte Latent Transformer (BLT), an LLM architecture that uses a learned dynamic scheme for processing patches of bytes instead of a tokenizer. This allows BLT models to match the performance of Llama 3 models but with 50% fewer inference FLOPS.

Most LLMs map text bytes into a fixed set of tokens, which has several drawbacks, including the famous strawberry problem. By contrast, BLT dynamically groups bytes into patches. It uses a small language model to compute the entropy of the next byte in a sequence and then starts a new patch when the entropy increases; essentially, the small model is predicting the end of a word, a relatively easy task compared to generating new words in a sequence. Because BLT is working directly with bytes, it is more robust to noisy inputs that have spelling mistakes. Increasing patch size can reduce FLOPS needed for inference, resulting in a larger model with better performance for the same compute budget. According to Meta,

BLT unlocks a new dimension for scaling, allowing simultaneous increases in model and patch size within a fixed inference budget. This new paradigm becomes advantageous for compute regimes commonly encountered in practical settings. While directly engaging with raw byte data, BLT also improves the model’s ability to handle the long-tail of data, offering significant improvements in robustness to noisy inputs and a deeper understanding of sub-word structures. Overall, these results position BLT as a promising alternative to traditional tokenization-based approaches, providing a scalable and robust framework for more efficient and adaptable language models.

Most LLMs, like Llama, operate on a fixed set of tokens, and sequences of input bytes are mapped onto a token using heuristics. Tokenization is needed because training an LLM on raw bytes instead of tokens would require too much computation, but it does have some disadvantages. Besides struggling with counting individual letters in words, tokenization can affect an LLM's ability to handle multiple languages and understand mis-typed words.

Meta did a series of experiments evaluating BLT, comparing it to token-based models. They found that while a fixed inference compute budget determines a token-based model's size, allowing patch size to increase allows for a larger BLT model and therefore better model accuracy. They also found that BLT models outperformed Llama 3 on character-level tasks, such as noisy input or low-resource language translation. However, when the researchers tried converting a Llama 3 model to BLT, instead of training a new model end-to-end, they found that it had a "significant" drop in performance on several LLM benchmarks.

In a discussion about BLT on Reddit, several users pointed out how BLT could help models solve the "strawberry problem." Another user wrote:

[BLT] is 100% the way to go. Also makes multimodality easy since you can just represent any data or file in bytes, and there exist A LOT of files. One problem is that 2 MB would need a context size of 2 million, so the memory and compute requirements are not quite met yet.

The BLT training and inference code are available on GitHub.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Meta Open-Sources Byte Latent Transformer LLM with Improved Scalability

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter