Skip to content

Video

SD.Next supports video creation using top-level Video tab
Supoport includes T2V: text-to-video and I2V: image-to-video

Tip

Latest video models use LLMs for prompting and due to that requires very long and descriptive prompt

Supported models

SD.Next supports following models out-of-the-box: - Hunyuan: HunyuanVideo, FastHunyuan, SkyReels | T2V, I2V
- WAN21: 1.3B, 14B | T2V, I2V
- LTXVideo: 0.9.0, 0.9.1, 0.9.5 | T2V, I2V
- CogVideoX: 2B, 5B | T2V, I2V
- Allegro: T2V
- Mochi1: T2V
- Latte1: T2V

Note

All models are auto-downloaded upon first use
Download location uses folder specificed by: Settings -> System paths -> Huggingface

Reference list

Engine Model Type Size Optimal Resolution Default Sampler Reference Values Special Notes
Hunyuan HunyuanVideo T2V 40.9GB 1280x720 Euler FlowMatch Frames:129 CFG:6.0 Steps:50 N/A
Hunyuan HunyuanVideo I2V 59.2GB 1280x720 Euler FlowMatch Frames:129 CFG:1.0 Steps:50 Issue: transformers version / TBD 16ch VAE
Hunyuan FastHunyuan T2V 25.0GB+15GB 1280x720 Euler FlowMatch Frames:125 CFG:6.0 True:1.0 Shift:17 Steps:6 N/A
Hunyuan SkyReels T2V 25.0GB+15GB 960x544 Euler FlowMatch Frames:97 CFG:1.0 True:6.0 Steps:50 N/A
Hunyuan SkyReels I2V 25.0GB+15GB 960x544 Euler FlowMatch Frames:97 CFG:1.0 True:6.0 Steps:50 N/A
WAN21 WAN 2.1 1.3B T2V 28.2GB 832x480 UniPC Frames:81 CFG:5.0 Steps:50 N/A
WAN21 WAN 2.1 14B T2V 78.1GB 1280x720 UniPC Frames:81 CFG:5.0 Steps:50 N/A
WAN21 WAN 2.1 14B 480p I2V 832x480 UniPC Frames:81 CFG:5.0 Steps:50
WAN21 WAN 2.1 14B 720p I2V 1280x720 UniPC Frames:81 CFG:5.0 Steps:50
LTXVideo LTXVideo 0.9.0 T2V 704x480 Euler FlowMatch Frames:161 Steps:50 N/A
LTXVideo LTXVideo 0.9.0 I2V 704x480 Euler FlowMatch Frames:161 Steps:50 N/A
LTXVideo LTXVideo 0.9.1 T2V 24.1GB 704x512 Euler FlowMatch Frames:161 CFG:3 Steps:50 N/A
LTXVideo LTXVideo 0.9.1 I2V 24.1GB 704x512 Euler FlowMatch Frames:161 CFG:3 Steps:50 N/A
LTXVideo LTXVideo 0.9.5 T2V 24.8GB 768x512 Euler FlowMatch Frames:161 Steps:40 N/A
LTXVideo LTXVideo 0.9.5 I2V 768x512 Euler FlowMatch Frames:161 Steps:40 N/A
CogVideoX CogVideoX 1.0 2B T2V 720x480 Cog DDIM Frames:49 CFG:6.0 Steps:50 N/A
CogVideoX CogVideoX 1.0 5B T2V 720x480 Cog DDIM Frames:49 CFG:6.0 Steps:50 N/A
CogVideoX CogVideoX 1.0 5B I2V 720x480 Cog DDIM Frames:49 CFG:6.0 Steps:50 N/A
CogVideoX CogVideoX 1.5 5B T2V 30.3GB 1360x768 Cog DDIM Frames:81 CFG:6.0 Steps:50 Issue: blank output
CogVideoX CogVideoX 1.5 5B I2V 1360x768 Cog DDIM Frames:81 CFG:6.0 Steps:50 Issue: blank output
Allegro Allegro T2V 24.7GB 1280x720 Euler a Frames:88 CFG:7.5 Steps=100 Issue: blank output
Mochi Mochi1 T2V 23.4GB 512x512 Euler FlowMatch Frames:16 CFG:7.5 Steps:50 N/A
Latte Latte1 T2V 23.4GB 512x512 DDIM Frames:16 CFG:7.5 Steps:50

Tip

Each model may require specific resolution or parameters to produce quality results
This also includes advanced paramters such as Sampler shift which would during normal text-to-image be considered not required to tweak
See individual model's original notes for recommendations on parameters

Note

Its recommended to use Default as sampler for all models unless you need to change specific sampler setting
For example, to change Sampler Shift, you need to select appropriate sampler for that model

Legacy models

Additional video models are available as individually selectable scripts in either text or image interfaces

LoRA

SD.Next includes LoRA support for Hunyuan, LTX, WAN, Mochi, Cog

See LoRA for more details

Optimizations

Warning

Any use on GPUs below 16GB and systems below 48GB RAM is experimental

Memory

Offloading helps by moving data between system RAM and GPU VRAM memory as needed
However, there is no way around requirement that entire model must be loaded into RAM before it can be used
Look at the total model size in the table and make sure you have enough RAM to load the model

Offloading

Enable offloading so model components can be moved in and out of VRAM as needed
Most models support all offloading types: - Balanced: recommended, but may require extra tuning
- Model: simplest
- Sequential: highest memory savings, but slowest

See Offload for more details

Quantization

Enable on-the-fly quantization during load in Settings -> Quantization for additional memory savings
- BnB - TorchAO - Optimum-Quanto

You can enable quantization for both or either Transformers and Text-Encoder separately
- Most T2V and I2V models support on-the-fly quantization of transformers module
- Most T2V support quantization of text-encoder while I2V model may not due to inability to quantize image vectors

See Quantization for more details

Decoding

Instead of using full VAE that is packaged with the model itself to decode final frames, SD.Next supports use of Tiny VAE as well as ability to use Remote VAE to decode video

  • Tiny VAE: support for Hunyuan, WAN, Mochi
  • Remote VAE: support for Hunyuan

See VAE for more details

Processing

SD.Next supports two types of optional processing acceleration:
- FasterCache
support for Hunyuan, Mochi, Latte, Allegro, Cog, WanDB, LTX
- PyramidAttentionBroadcast
support for Hunyuan, Mochi, Latte, Allegro, Cog, WanDB, LTX

Interpolation

For all video modules, SD.Next supports adding interpolated frames to video for smoother output
Interpolation (if enabled) is performed using RIFE Real-Time Intermediate Flow Estimation

Issues/Limitations

See TODO for known issues and limitations