Video
SD.Next supports video creation using top-level Video tab
Supoport includes T2V: text-to-video and I2V: image-to-video
Tip
Latest video models use LLMs for prompting and due to that requires very long and descriptive prompt
Supported models
SD.Next supports following models out-of-the-box:
- Hunyuan: HunyuanVideo, FastHunyuan, SkyReels | T2V, I2V
- WAN21: 1.3B, 14B | T2V, I2V
- LTXVideo: 0.9.0, 0.9.1, 0.9.5 | T2V, I2V
- CogVideoX: 2B, 5B | T2V, I2V
- Allegro: T2V
- Mochi1: T2V
- Latte1: T2V
Note
All models are auto-downloaded upon first use
Download location uses folder specificed by: Settings -> System paths -> Huggingface
Reference list
Engine | Model | Type | Size | Optimal Resolution | Default Sampler | Reference Values | Special Notes |
---|---|---|---|---|---|---|---|
Hunyuan | HunyuanVideo | T2V | 40.9GB | 1280x720 | Euler FlowMatch | Frames:129 CFG:6.0 Steps:50 | N/A |
Hunyuan | HunyuanVideo | I2V | 59.2GB | 1280x720 | Euler FlowMatch | Frames:129 CFG:1.0 Steps:50 | Issue: transformers version / TBD 16ch VAE |
Hunyuan | FastHunyuan | T2V | 25.0GB+15GB | 1280x720 | Euler FlowMatch | Frames:125 CFG:6.0 True:1.0 Shift:17 Steps:6 | N/A |
Hunyuan | SkyReels | T2V | 25.0GB+15GB | 960x544 | Euler FlowMatch | Frames:97 CFG:1.0 True:6.0 Steps:50 | N/A |
Hunyuan | SkyReels | I2V | 25.0GB+15GB | 960x544 | Euler FlowMatch | Frames:97 CFG:1.0 True:6.0 Steps:50 | N/A |
WAN21 | WAN 2.1 1.3B | T2V | 28.2GB | 832x480 | UniPC | Frames:81 CFG:5.0 Steps:50 | N/A |
WAN21 | WAN 2.1 14B | T2V | 78.1GB | 1280x720 | UniPC | Frames:81 CFG:5.0 Steps:50 | N/A |
WAN21 | WAN 2.1 14B 480p | I2V | 832x480 | UniPC | Frames:81 CFG:5.0 Steps:50 | ||
WAN21 | WAN 2.1 14B 720p | I2V | 1280x720 | UniPC | Frames:81 CFG:5.0 Steps:50 | ||
LTXVideo | LTXVideo 0.9.0 | T2V | 704x480 | Euler FlowMatch | Frames:161 Steps:50 | N/A | |
LTXVideo | LTXVideo 0.9.0 | I2V | 704x480 | Euler FlowMatch | Frames:161 Steps:50 | N/A | |
LTXVideo | LTXVideo 0.9.1 | T2V | 24.1GB | 704x512 | Euler FlowMatch | Frames:161 CFG:3 Steps:50 | N/A |
LTXVideo | LTXVideo 0.9.1 | I2V | 24.1GB | 704x512 | Euler FlowMatch | Frames:161 CFG:3 Steps:50 | N/A |
LTXVideo | LTXVideo 0.9.5 | T2V | 24.8GB | 768x512 | Euler FlowMatch | Frames:161 Steps:40 | N/A |
LTXVideo | LTXVideo 0.9.5 | I2V | 768x512 | Euler FlowMatch | Frames:161 Steps:40 | N/A | |
CogVideoX | CogVideoX 1.0 2B | T2V | 720x480 | Cog DDIM | Frames:49 CFG:6.0 Steps:50 | N/A | |
CogVideoX | CogVideoX 1.0 5B | T2V | 720x480 | Cog DDIM | Frames:49 CFG:6.0 Steps:50 | N/A | |
CogVideoX | CogVideoX 1.0 5B | I2V | 720x480 | Cog DDIM | Frames:49 CFG:6.0 Steps:50 | N/A | |
CogVideoX | CogVideoX 1.5 5B | T2V | 30.3GB | 1360x768 | Cog DDIM | Frames:81 CFG:6.0 Steps:50 | Issue: blank output |
CogVideoX | CogVideoX 1.5 5B | I2V | 1360x768 | Cog DDIM | Frames:81 CFG:6.0 Steps:50 | Issue: blank output | |
Allegro | Allegro | T2V | 24.7GB | 1280x720 | Euler a | Frames:88 CFG:7.5 Steps=100 | Issue: blank output |
Mochi | Mochi1 | T2V | 23.4GB | 512x512 | Euler FlowMatch | Frames:16 CFG:7.5 Steps:50 | N/A |
Latte | Latte1 | T2V | 23.4GB | 512x512 | DDIM | Frames:16 CFG:7.5 Steps:50 |
Tip
Each model may require specific resolution or parameters to produce quality results
This also includes advanced paramters such as Sampler shift which would during normal text-to-image be considered not required to tweak
See individual model's original notes for recommendations on parameters
Note
Its recommended to use Default as sampler for all models unless you need to change specific sampler setting
For example, to change Sampler Shift, you need to select appropriate sampler for that model
Legacy models
Additional video models are available as individually selectable scripts in either text or image interfaces
- Stable Video Diffusion, Base, XY 1.0 and XT 1.1
- VGen
- AnimateDiff
LoRA
SD.Next includes LoRA support for Hunyuan, LTX, WAN, Mochi, Cog
See LoRA for more details
Optimizations
Warning
Any use on GPUs below 16GB and systems below 48GB RAM is experimental
Memory
Offloading helps by moving data between system RAM and GPU VRAM memory as needed
However, there is no way around requirement that entire model must be loaded into RAM before it can be used
Look at the total model size in the table and make sure you have enough RAM to load the model
Offloading
Enable offloading so model components can be moved in and out of VRAM as needed
Most models support all offloading types:
- Balanced: recommended, but may require extra tuning
- Model: simplest
- Sequential: highest memory savings, but slowest
See Offload for more details
Quantization
Enable on-the-fly quantization during load in Settings -> Quantization for additional memory savings
- BnB
- TorchAO
- Optimum-Quanto
You can enable quantization for both or either Transformers and Text-Encoder separately
- Most T2V and I2V models support on-the-fly quantization of transformers module
- Most T2V support quantization of text-encoder while I2V model may not due to inability to quantize image vectors
See Quantization for more details
Decoding
Instead of using full VAE that is packaged with the model itself to decode final frames, SD.Next supports use of Tiny VAE as well as ability to use Remote VAE to decode video
- Tiny VAE: support for Hunyuan, WAN, Mochi
- Remote VAE: support for Hunyuan
See VAE for more details
Processing
SD.Next supports two types of optional processing acceleration:
- FasterCache
support for Hunyuan, Mochi, Latte, Allegro, Cog, WanDB, LTX
- PyramidAttentionBroadcast
support for Hunyuan, Mochi, Latte, Allegro, Cog, WanDB, LTX
Interpolation
For all video modules, SD.Next supports adding interpolated frames to video for smoother output
Interpolation (if enabled) is performed using RIFE Real-Time Intermediate Flow Estimation
Issues/Limitations
See TODO for known issues and limitations