LTXVideo

Note

This docs covers the optimized LTXVideo integration with SD.Next available in Video->LTX tab
Other LTXVideo models are available under Generic integration in Video->Generic tab

Optimized LTXVideo support is based on LTXVideo 0.9.7 13B with size of 46.5GB
Model will be auto-downloaded on first use

Warning

Due to model size, quantization and offloading are highly recommended
See docs for for more details on offloading and quantization

Parameters

The model works best on resolutions under 1280x720 and number of frames below 257

Conditions

The model supports default text-to-video workflow, but can optionally use image-to-video and video-to-video conditioning
In video-to-video workflow, you can set maximum number of frames and option to skip every n-th frame from input video

LoRA

Support includes official LTXVideo LoRAs that can be downloaded from HuggingFace
This includes LoRAs that can be used to change behavior of conditioning
For example, you can load Canny/Pose/Depth or LoRA and then use pre-processed image or video as conditioning input

Also provided in official repo is Distilled LoRA
which can be used to reduce required step count and improve quality of generated video

Third party LoRAs should work as well, but are not tested

Video Encode

Important

Video support requires ffmpeg to be installed and available in the PATH

Video location is set in settings -> image paths -> video

Video is encoded using selected codec and codec options
Default codec is libx264, to see codecs available on your system, use refresh
By default, model will not create image files, but can be enabled in video settings

Tip

Hardware-accelerated codecs (e.g. hevc_nvenc) will be at the top of the list
Use hardware-accelerated codecs whenever possible

Warning

Video encoding can be very memory intensive depending on codec and number of frames

Advanced Video Options

Any specified video options will be sent to ffmpeg as-is
For example, default crf:16 specifies the quality of the video vs compression rate, lower is better
For details, see https://trac.ffmpeg.org/wiki#Encoding

Interpolation

Video can optionally have additional interpolated frames added using RIFE interpolation method
For example, if you render 10sec 30fps video with 0 interpolated frames, its 300 frames that need to be generated
But if you set 3 interpolated frames, video fps and duration do not change, but only 100 frames need to be generated and additional 200 interpolated frames are added in-between generated frames