Variable Auto-Encoder (VAE)
VAE is a model that can be used to compress and decompress images.
It is a type of autoencoder that learns a latent space representation of the input data. The model is trained to minimize the reconstruction error of the input data, while also learning a latent space that is continuous and smooth. This allows the model to generate new images by sampling from the latent space.
Why?
All popular image generation models work in compressed in compressed latent space to allow for faster inference and reduced memory footprint. Which means VAE is required to convert images to and from latent space.
When?
- VAE is used to decode latent to image as a final step of image generation and
- VAE is used to encode image to latent as a first step of image processing in case of inpaint or image-to-image workflows
Tip
There is no image generation without VAE
If you're not specifying a custom VAE model, it simply means you're using a default one specified in the model
VAE Processing
SD.Next allows to use 3 types of VAE processing:
- Full: use full VAE model
See section on Choosing a VAE for more information
- Tiny: use tiny VAE model
Tiny VAE model is a smaller version of the full VAE model and has a reduced memory footprint
It is useful for generating images quickly, but may not be as accurate as the full VAE model
- Remote: use remote VAE model
Remote VAE model is a VAE model that is stored on a remote server and accessed over the network
It is useful for generating images on a machine with limited memory or processing power
See section on Remote VAE for more information
Choosing a VAE
When choosing a custom VAE, you must select a model that is compatible with the image generation model you are using. For example, SD-XL model can only use VAE models that are compatible with the SD-XL model.
Location for VAE models is specified in *Settings -> System Paths -> VAE, default is models/VAE
VAE can be set to:
- Default
Use the default VAE model specified in the image generation model
- Automatic
If there is a VAE model with the same name as image generation model, it will be used
Otherwise, it will use default VAE
- Custom
Load a custom VAE model from the list of available models
Remote VAE
Remote VAE is a free feature hosted by Huggingface.
For more information, see https://huggingface.co/docs/diffusers/main/en/hybrid_inference/overview
Notes:
- When using remote VAE, you must have an active internet connection
- Remote VAE is only available for some models: for example, SD 1.5, SD-XL and FLUX.1
- Remote VAE is limited to 2048x2048 resolution
- In case of remote VAE failure, SD.Next will auto-switch to local VAE
Note
Privacy: when using remote VAE, your latents/images are sent to Huggingface servers for processing.
However, they are NOT stored on the server.
No other information is sent or stored to remote servers.
Common Issues
VAE decode is a critical step in image generation and as such, it can be a source of many issues.
It is also the single most processing and memory intensive step in image generation.
- Image generation results in a black image
This is usually caused due to numerical instability in the VAE model itself combined with choosen GPU settings. For example, many current VAE models are not stable at standard float16 precision and require either full float32 precision (which doubles the memory requirements) or bfloat16 which slightly reduces the precision, but allows the model to work without overflows. - If your model is generating black images, try switching to bfloat16 precision in the GPU settings.
-
If that doesn't help, try using a custom VAE model that is more stable at lower precision.
Such VAE may be commonly named as fp16-fixed -
Image generation hangs at 100%
Its not hanging, its just that means that all generate steps are done and now VAE is processing the latent space to generate the final image.
However, if you're running out of VRAM and system decided to swap it to RAM, it may easily take a 10x longer time to finish. It is recommended to disallow system RAM->VRAM swapping. -
Image generation fails with out-of-memory error
VAE decode is the single most memory intensive step in image generation and can result in large memory usage. If you cannot process images due to memory constraints, try using a smaller VAE model or reducing the resolution of the images. Note that running refine or hires steps will require more memory than the initial generate step - simply because they typically run at higher resolution.