LTXVideo

Note

This document covers the optimized LTXVideo integration in the Video → LTX tab. Other LTXVideo models are available in Video → Generic tab.

The optimized LTXVideo support is based on LTXVideo 0.9.7 13B at 46.5GB. The model will be auto-downloaded on first use.

Warning

Due to model size, quantization and offloading are highly recommended.

Parameters

The model works best at resolutions under 1280×720 with fewer than 257 frames.

Conditions

The model supports default text-to-video workflow and can optionally use image-to-video or video-to-video conditioning. In video-to-video mode, you can set the maximum number of frames and skip every n-th frame from the input video.

LoRA

Official LTXVideo LoRAs can be downloaded from HuggingFace. These include LoRAs to modify conditioning behavior, such as Canny/Pose/Depth LoRAs that work with pre-processed images or videos as conditioning input.

The official repo also provides a Distilled LoRA to reduce required step count and improve video quality.

Third-party LoRAs should work but are not tested.

Video Encoding

Important

Video support requires ffmpeg to be installed and available in the PATH

Video location is set in Settings → Image Paths → Video.

Video is encoded using the selected codec and codec options. Default codec is libx264. Use the refresh button to see codecs available on your system. By default, the model does not create image files, but this can be enabled in video settings.

Tip

Hardware-accelerated codecs (e.g. hevc_nvenc) appear at the top of the list. Use them whenever possible.

[!WARNING] Video encoding can be very memory intensive depending on the codec and number of frames.

Advanced Video Options

Any specified video options are sent to ffmpeg as-is. For example, the default crf:16 specifies video quality vs. compression ratio (lower is better).

For details, see the FFmpeg encoding documentation.

Interpolation

Video can optionally have additional interpolated frames added using the RIFE interpolation method. For example, a 10-second 30fps video normally requires 300 frames. Setting 3 interpolated frames means only 100 frames are generated, with 200 additional frames interpolated between them. The final video fps and duration remain unchanged.