BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Stable Diffusion 3.5 Improves Text Rendering, Image Quality, Consistency, and More

Stable Diffusion 3.5 Improves Text Rendering, Image Quality, Consistency, and More

Stability AI has released Stable Diffusion 3.5 Large, its most powerful text-to-image generation model to date, and Stable Diffusion 3.5 Large Turbo, with special emphasis on customizability, efficiency, and flexibility. Both models come with a free licensing model for non commercial and limited commercial use.

Stable Diffusion 3.5 Large is an 8 billion parameters model which can generate professional images at 1 megapixel resolution, says Stability AI. Stable Diffusion 3.5 Large Turbo is a distilled version of Stable Diffusion 3.5 Large that focuses on being faster by reducing the number of required steps to just four. Both models, says Stability AI, provide top-tier performance in prompt adherence and image quality.

One of the goals behind the Stable Diffusion 3.5 models is customizability, meaning the possibility for users to fine-tune the model or build customized workflows. To train LoRAs with Stable Diffusion 3.5, you can use the existing SD3 training script with some additional caveats if you want to have it work with quantization. Stable Diffusion 3.5 is also optimized to run on standard consumer hardware, according to Stability AI, and to provide a diverse output, including skin tones, 3D images, photography, painting, and so on.

Stable Diffusion 3.5 follows Stable Diffusion 3 Medium, released last June, which garnered criticism in several areas, including its ability to accurately depict human anatomy and specifically hands. In the 3.5 release announcement, Stability AI acknowledged the community's dissatisfaction and made clear 3.5 is not a quick fix but a step forward in Stable Diffusion evolution. Anyway, while Stable Diffusion 3.5 fixes the known issues with "girls lying in the grass", it still may fail with apparently basic prompts.

As Stability AI explains, Stable Diffusion 3.5 has a similar architecture to SD3's, with two major changes: the use of QK normalization and of double attention layers.

As mentioned, Stable Diffusion 3.5 is released under a permissive license allowing free use for non commercial projects and commercial purposes for creators whose total annual revenue is less than $1M. The free "community" model expressly forbids creating competing foundational models. While this could sound too restrictive, custom models trained using common customization techniques such as LoRAs, hypernetworks, finetunes, retrain, retrain from scratch are not considered "foundational".

Later this month, Stability AI is going to release Stable Diffusion 3.5 Medium, using 2.5 billion parameters and designed to run on consumer hardware. This will further enable the creation of custom models on a variety of hardware, albeit with a slight loss of output quality.

You can download Stable Diffusion 3.5 inference code from GitHub, while the model itself is available on huggingface. You can also use the model on platforms like Replicate, ComfyUI, and DeepInfra or directly using Stability AI API.

About the Author

Rate this Article

Adoption
Style

BT