Variational Autoencoder (VAE)

VAE is a model used to compress and decompress images. It learns a latent-space representation of input data and minimizes reconstruction error during training. Because the latent space is continuous and smooth, it can also be sampled to generate new outputs.

Why?

Popular image generation models work in compressed latent space for faster inference and lower memory use. That means VAE is required to convert images to and from latent space.

When?

VAE is used to decode latent to image as a final step of image generation and
VAE is used to encode image to latent as a first step for inpaint and image-to-image workflows

Tip

There is no image generation without VAE
If you're not specifying a custom VAE model, it simply means you're using a default one specified in the model

VAE Processing

SD.Next supports three VAE processing types:

Full: use full VAE model
See section on Choosing a VAE for more information
Tiny: use tiny VAE model
Tiny VAE is a smaller version of the full model with lower memory usage. It can generate images faster, but may be less accurate than full VAE.
Remote: use remote VAE model
Remote VAE is hosted on a remote server and accessed over the network. It is useful on machines with limited memory or compute. See section on Remote VAE for more information

Choosing a VAE

When choosing a custom VAE, select one that is compatible with your base model. For example, an SD-XL model requires an SD-XL-compatible VAE.

Set the VAE model path in Settings -> System Paths -> VAE. Default path: models/VAE.

VAE can be set to:

Default
Use the default VAE model specified in the image generation model
Automatic
If there is a VAE model with the same name as image generation model, it will be used
Otherwise, it will use default VAE
Custom
Load a custom VAE model from the list of available models

Remote VAE

Remote VAE is a free feature hosted by Huggingface.
For more information, see https://huggingface.co/docs/diffusers/main/en/hybrid_inference/overview

Notes:

When using remote VAE, you must have an active internet connection
Remote VAE is only available for some models: for example, SD 1.5, SD-XL, FLUX.1 and HiDream
Remote VAE is limited to 2048x2048 resolution
On remote VAE failure, SD.Next automatically switches to local VAE

Note

Privacy: when using remote VAE, your latents/images are sent to Huggingface servers for processing.
However, they are NOT stored on the server.
No other information is sent or stored to remote servers.

Common Issues

VAE decode is a critical step in image generation, and it can cause several common issues. It is also the most compute- and memory-intensive step in the pipeline.

Image generation results in a black image
This is usually caused by numerical instability in the VAE model combined with chosen GPU settings. Many VAE models are unstable at float16 precision and need either float32 (higher memory use) or bfloat16 (slightly lower precision, but fewer overflows).
If your model is generating black images, try switching to bfloat16 precision in the GPU settings.
If that doesn't help, try using a custom VAE model that is more stable at lower precision.
Such VAE may be commonly named as fp16-fixed
Image generation hangs at 100%
It is usually not hanging. Generation steps are complete, and VAE decode is still processing. If VRAM is exhausted and the system starts swapping, completion can take much longer (often around 10x). It is recommended to avoid RAM->VRAM swapping.
Image generation fails with out-of-memory error
VAE decode can use a large amount of memory. If you hit memory limits, try enabling VAE Tiling or reducing image resolution. Also note that refine and hires steps usually need more memory than the initial generate step because they run at higher resolution.