Stable Diffusion 3.5 Improves Text Rendering, Image Quality, Consistency, and More

Stability AI has released Stable Diffusion 3.5 Large, its most powerful text-to-image generation model to date, and Stable Diffusion 3.5 Large Turbo, with special emphasis on customizability, efficiency, and flexibility. Both models have a free licensing model for non-commercial and limited commercial use.

Stable Diffusion 3.5 Large is an eight-billion-parameter model that can generate professional images at 1-megapixel resolution, according to Stability AI. Stable Diffusion 3.5 Large Turbo is a distilled version of Stable Diffusion 3.5 Large that focuses on being faster by reducing the required steps to just four. Both models, says Stability AI, provide top-tier performance in prompt adherence and image quality.

One of the goals behind the Stable Diffusion 3.5 models is customizability, meaning the possibility for users to fine-tune the model or build customized workflows. To train LoRAs with Stable Diffusion 3.5, you can use the existing SD3 training script with some additional caveats if you want to have it work with quantization. Stable Diffusion 3.5 is also optimized to run on standard consumer hardware, according to Stability AI, and to provide a diverse output, including skin tones, 3D images, photography, painting, and so on.

Stable Diffusion 3.5 follows Stable Diffusion 3 Medium, released last June, which garnered criticism in several areas, including its ability to accurately depict human anatomy, specifically hands. In the 3.5 release announcement, Stability AI acknowledged the community's dissatisfaction and made clear that 3.5 is not a quick fix but a step forward in Stable Diffusion evolution. Anyway, while Stable Diffusion 3.5 fixes the known issues with "girls lying in the grass", it still may fail with apparently basic prompts.

As Stability AI explains, Stable Diffusion 3.5 has a similar architecture to SD3's, with two major changes: the use of QK normalization and double attention layers.

As mentioned, Stable Diffusion 3.5 is released under a permissive license, allowing free use for non-commercial projects and commercial purposes for creators whose total annual revenue is less than $1M. The free "community" model expressly forbids creating competing foundational models. While this could sound too restrictive, custom models trained using common customization techniques such as LoRAs, hypernetworks, finetunes, and retrain from scratch are not considered "foundational".

Later this month, Stability AI is going to release Stable Diffusion 3.5 Medium, using 2.5 billion parameters and designed to run on consumer hardware. This will further enable the creation of custom models on a variety of hardware, albeit with a slight loss of output quality.

You can download Stable Diffusion 3.5 inference code from GitHub, while the model itself is available on huggingface. You can also use the model on platforms like Replicate, ComfyUI, and DeepInfra or directly using Stability AI API.

About the Author

Sergio De Simone

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

About the Author

Sergio De Simone

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

How could we improve? Take the InfoQ reader survey

The InfoQ Newsletter