Variational Autoencoder (VAE)

VAE is a model that can be used to compress and decompress images.
It is a type of autoencoder that learns a latent space representation of the input data. The model is trained to minimize the reconstruction error of the input data, while also learning a latent space that is continuous and smooth. This allows the model to generate new images by sampling from the latent space.

Why?

All popular image generation models work in compressed in compressed latent space to allow for faster inference and reduced memory footprint. Which means VAE is required to convert images to and from latent space.

When?

VAE is used to decode latent to image as a final step of image generation and
VAE is used to encode image to latent as a first step of image processing in case of inpaint or image-to-image workflows

Tip

There is no image generation without VAE
If you're not specifying a custom VAE model, it simply means you're using a default one specified in the model

VAE Processing

SD.Next allows to use 3 types of VAE processing:

Full: use full VAE model
See section on Choosing a VAE for more information
Tiny: use tiny VAE model
Tiny VAE model is a smaller version of the full VAE model and has a reduced memory footprint
It is useful for generating images quickly, but may not be as accurate as the full VAE model
Remote: use remote VAE model
Remote VAE model is a VAE model that is stored on a remote server and accessed over the network
It is useful for generating images on a machine with limited memory or processing power
See section on Remote VAE for more information

Choosing a VAE

When choosing a custom VAE, you must select a model that is compatible with the image generation model you are using. For example, SD-XL model can only use VAE models that are compatible with the SD-XL model.

Location for VAE models is specified in *Settings -> System Paths -> VAE, default is models/VAE

VAE can be set to:

Default
Use the default VAE model specified in the image generation model
Automatic
If there is a VAE model with the same name as image generation model, it will be used
Otherwise, it will use default VAE
Custom
Load a custom VAE model from the list of available models

Remote VAE

Remote VAE is a free feature hosted by Huggingface.
For more information, see https://huggingface.co/docs/diffusers/main/en/hybrid_inference/overview

Notes:

When using remote VAE, you must have an active internet connection
Remote VAE is only available for some models: for example, SD 1.5, SD-XL, FLUX.1 and HiDream
Remote VAE is limited to 2048x2048 resolution
In case of remote VAE failure, SD.Next will auto-switch to local VAE

Note

Privacy: when using remote VAE, your latents/images are sent to Huggingface servers for processing.
However, they are NOT stored on the server.
No other information is sent or stored to remote servers.

Common Issues

VAE decode is a critical step in image generation and as such, it can be a source of many issues.
It is also the single most processing and memory intensive step in image generation.

Image generation results in a black image
This is usually caused due to numerical instability in the VAE model itself combined with choosen GPU settings. For example, many current VAE models are not stable at standard float16 precision and require either full float32 precision (which doubles the memory requirements) or bfloat16 which slightly reduces the precision, but allows the model to work without overflows.
If your model is generating black images, try switching to bfloat16 precision in the GPU settings.
If that doesn't help, try using a custom VAE model that is more stable at lower precision.
Such VAE may be commonly named as fp16-fixed
Image generation hangs at 100%
Its not hanging, its just that means that all generate steps are done and now VAE is processing the latent space to generate the final image.
However, if you're running out of VRAM and system decided to swap it to RAM, it may easily take a 10x longer time to finish. It is recommended to disallow system RAM->VRAM swapping.
Image generation fails with out-of-memory error
VAE decode is the single most memory intensive step in image generation and can result in large memory usage. If you cannot process images due to memory constraints, try enabling VAE Tiling or reducing the resolution of the images. Note that running refine or hires steps will require more memory than the initial generate step - simply because they typically run at higher resolution.