Stable-Casdade
Original repo: https://github.com/Stability-AI/StableCascade
Use
- Set your compute precision in Settings -> Compute -> Precision
to either BF16 (if supported) or FP32 (if not supported)
Note: FP16 is not supported for this model - Enable model offloading in Settings -> Diffusers -> Model CPU offload
without this, stable cascade will use >16GB of VRAM - Recommended: Set sampler to Default
- Select model from Networks -> Models -> Reference
you can select either Full or Lite variation of the model and it will automatically be downloaded on first use and loaded into SD.Next
attempting to load a manually downloaded safetensors files is not supported as model requires special handling
SD.Next automatically chooses BF16 variation when downloading from networs -> reference
since its smaller and can be used with either BF16 or FP32 compute precision
UNet models:
- Put the UNet safetensors in
models/UNet
folder and put the text encoder (if you use one) in there too. Text encoder name should be the UNet Name + _text_encoder - Load the Stable Cascade base (or a custom decoder) from Huggingface as the main model first, then load the UNet (prior) model as the UNet model from settings.
Example UNet name: sc_unet.safetensors
Example Text Encoder name: sc_unet_text_encoder.safetensors
Params
- Prompt & Negative prompt: as usual
- Width & Height: as usual
- CFG scale: used to condition the prior model, reference value is ~4
- Secondary CFG scale: used to condition decoder model, reference value is ~1
- Steps: used to control number of steps of the prior model
- Refiner steps: used to control number of steps of the decoder model
- Sampler: recommended to set to Default before loading a model
Stable Cascade has its own sampler and results with standard samplers will look suboptimal
Built-in sampler is DDIM/DDPM based, so if you want to experiment at least use similar sampler
Notes
- If model download fails, simply retry it, it will continue from where it left off
- Model consists out of 3 stages split into 2 pipelines which are exected as C -> B -> A:
- Full variation requires ~10GB VRAM and runs at ~3 it/s on RTX4090 at 1024px
- Lite variation requires ~4GB VRAM and runs at ~6 it/s on RTX4090 at 1024px
Note: performance numbers are for combined pipeline, both decoder and prior
Variations
Overview
Stable cascade is a 3-stage model split into two pipelines (so-called prior and decoder) and comes into two main variations: Full and Lite
You can select which one to use from Networks -> Models -> Reference
Additionally, each variation comes in 3 different precisions: FP32, BF16, and FP16
Note: FP16 is an unofficial version by @KohakuBlueleaf of the model fixed to work with FP16 and may result in slightly different output
Which precision is going to get loaded depends on:
- your user preference in Settings -> Compute -> Precision
- and GPU compatibility as not all GPUs support all precision types
Sizes
Stage A and auxiliary models sizes are fixed and noted above
Stage B and Stage C models are dependent on the variation and precision used
Variation | Precision | Stage B | Stage C |
---|---|---|---|
Full | FP32 | 6.2GB | 14GB |
Full | BF16 | 3.1GB | 7GB |
Full | FP16 | N/A | 7GB |
Lite | FP32 | 2.8GB | 4GB |
Lite | BF16 | 1.4GB | 2GB |
Lite | FP16 | N/A | N/A |