BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Meta AI Reveals CM3leon, an Advanced Text-to-Image Generative Model

Meta AI Reveals CM3leon, an Advanced Text-to-Image Generative Model

Meta AI has introduced CM3leon, a novel multimodal model combining text and image production. This model is the first of its type, using a modified formula from text-only language models to deliver remarkable outcomes with unequaled computational efficiency.

This new model generates text-to-images at a state-of-the-art rate while utilizing five times less computing power than earlier transformer-based techniques. It maintains low training costs and high inference efficiency while combining the adaptability and efficiency of autoregressive models. As a causal masked mixed-modal (CM3) model, CM3leon enhances the capabilities of prior models by being able to produce text and image sequences dependent on arbitrary sequences of other text and image content.

CM3leon possesses both the power and adaptability characteristic of autoregressive models, along with the remarkable efficiency and cost-effectiveness during both training and inference stages. This significant advancement overcomes the limitations of previous models, which were restricted to performing either text or image generation tasks exclusively.

CM3Leon's architecture uses a decoder-only transformer akin to well-established text-based models. However, what sets CM3Leon apart is its ability to input and generate both text and images. This empowers CM3Leon to successfully handle a variety of tasks like prompt questions and model generations.

According to Meta’s research on Autoregressive Multi-Modal Models, diffusion models have recently taken over picture production efforts because of their superior performance and low computing cost. Token-based autoregressive models, on the other hand, are well known to likewise yield great results, with superior global picture coherence in particular, but they are significantly more expensive to train and employ for inference.

Generative models are getting more and more complex trained on millions of sample photos to learn the relationship between visuals and text, but they may also reflect any biases found in the training data. While AI-generated images have become increasingly familiar through popular tools like Stable Diffusion, DALL·E, and Midjourney, Meta AI's approach in constructing CM3leon and the performance it promises to deliver represent a significant leap forward.

About the Author

Rate this Article

Adoption
Style

BT