Video
Video models are large and resource intensive
Biggest resource issue is final decode, since they are typically design to decode entire generated video at once to achieve temporal consistency
To reduce resource requirements, reduce number of generated frames and/or resolution
SD.Next support for video models is relatively basic with further optimizations pending community interest
Any future optimizations would likely have to go into partial loading and excecution instead of offloading inactive parts of the model
Warning
Any use on GPUs below 16GB and systems below 48GB RAM is experimental
Note
Latest video models use LLMs for prompting and due to that requires very long and descriptive prompt
Tip
You may need to enable sequential offload for maximum gpu memory savings or use balanced offload with maximally reduced min/max watermarks
Tip
Optionally enable pre-quantization using bnb for additional memory savings
Supported models
All video models are available as individually selectable scripts in either text or image interfaces
- Stable Video Diffusion
support for base, xt 1.0 and xt 1.1 - CogVideoX
support for 2B and 5B text-to-video and 5B image-to-video - Lightricks LTX-Video
model size: 27.75gb
support for text-to-video and image-to-video refrence values: steps 50, width 704, height 512, frames 161, guidance scale 3.0 - Hunyuan Video
model size: 40.92gb
support for text-to-video, to use refrence values: steps 50, width 1280, height 720, frames 129, guidance scale 6.0 - Genmo Mochi.1 Preview
model size: 68.87gb
support for text-to-video, to use refrence values: steps 64, width 848, height 480, frames 19, guidance scale 4.5 - VGen
- AnimateDiff
Interpolation
For all video modules, SD.Next supports adding interpolated frames to video for smoother output