Skip to content

FramePack

Implementation of Lllyasviel FramePack for Tencent HunyuanVideo I2V,
with major differences and improvements: - T2V, I2V & FLF2V modes support
- Bi-directional and Forward-only (F1) model variants
- Resolution and frame-rate scaling: output resolution in any supported aspect ratio and target resolution
- Prompt enhancer: use LLM to enhance your short prompts
- Complex actions: modify prompt each section of the video
- Video encode: multiple video codecs, raw export, frame export, frame interpolation
- LoRA: support for LoRA models
- Offloading and quantization: support for offloading and quantization
- API: support for use via HTTP REST API calls
- Custom model: support for custom model loading

Screenshot:

Image

Example:

Important

Video support requires ffmpeg to be installed and available in the PATH

Modes

Supports 3 modes of operation:

T2V: text-to-video

  • Uses only text prompt and generates video from it
  • Automatically triggered if no init image is provided

I2V: image-to-video

  • Uses image as init frame and text prompt to generate video
  • Automatically triggered if init image is provided and there is no end frame
  • Init strength controls the strength of the image, similar to img2img denoising strength
  • Vision strength controls the strength of the vision model that controls video generation

FLF2V: frame-last-frame-to-video

  • Uses images as init frame and end-frame and text prompt to generate video
  • Automatically triggered if init image and end frame are provided
  • Init strength controls the strength of the image, similar to img2img denoising strength
  • End strength controls the strength of the end frame compared to init frame
  • Vision strength controls the strength of the vision model that controls video generation
  • Ratio of init and end strengths can be used to skew video towards init or end frame

Variants

Both FramePack variants are based on the HunyuanVideo model, but use different generation approaches:
- Bi-directional model: default
Runs generation in reverse order and assembles video
- Forward-only model: F1
Runs generation in forward order and assembles video

Resolution Scaling

The video model is trained at 640p, but can generate at other resolutions.
Resolution must still match a supported aspect ratio.
Given any input image, the model first finds the closest supported aspect ratio, then scales to the target resolution.
As a result, output resolution is the closest supported ratio to your requested size.

Note

Resolution is directly proportional to VRAM usage, so if you have low VRAM, use lower resolution

Frame rate can be set to any value. It controls both generated frame count and encoded playback speed.

Prompt Enhancer

Uses VLM to expand short prompts. For example, dancing or jumping can be expanded into a detailed prompt.
VLM first analyzes the input image (if provided), then generates a longer prompt that combines image context and your short prompt.

Complex Actions

When changing duration or FPS, the model prints how many sections will be generated.
Each video section can have its own prompt suffix, which can be used to change the prompt over time
Prompt suffix is a string that will be added to the end of the prompt
Each line in section prompts is used as a separate prompt suffix.
For example, with 3 sections and 3 lines, each section uses one different line.

Example:
- main prompt: astronaut on the moon
- section prompts:
- line-1: walking
- line-2: jumping

The number of lines in section prompts does not need to match the number of sections.
If there are fewer lines than sections, prompts are interpolated to stretch across the full video duration.
For example, with 4 sections and 2 lines, the first line is used for sections 1-2 and the second line for sections 3-4.

Section prompts are optional, but useful for more complex videos.
They are compatible with Prompt Enhancer. In that mode, each combined prompt (base prompt plus section suffix) is enhanced separately.

Video Encode

Video output location is set in Settings -> Image paths -> Video.

Video is encoded with the selected codec and codec options.
The default codec is libx264. To see codecs available on your system, use refresh.
By default, the model does not save image files, but this can be enabled in video settings.

Tip

Hardware-accelerated codecs (e.g. hevc_nvenc) will be at the top of the list
Use hardware-accelerated codecs whenever possible

[!WARNING] Video encoding can be memory-intensive depending on codec and frame count.

Advanced Video Options

Any specified video options are passed to ffmpeg as-is.
For example, default crf:16 controls quality versus compression; lower values mean higher quality.
For details, see https://trac.ffmpeg.org/wiki#Encoding

Interpolation

Video can optionally add interpolated frames using RIFE.
For example, a 10-second 30 fps video with 0 interpolation requires 300 generated frames.
If you set 3 interpolated frames, fps and duration stay the same, but only 100 frames are generated and 200 interpolated frames are inserted between them.

CLI Video Encode

Video encoding can be skipped by setting codec to none
In that case, you may want to save raw frames as a safetensors file and use a command-line utility to encode video later.

python encode-video.py

Allows you to:
- Export frames from a safetensors file as individual images.
These can be used for further processing or to manually create a video from an image sequence with ffmpeg.
- Encode frames from a safetensors file into video using cv2.
- Encode frames from a safetensors file into video using torchvision/ffmpeg.

LoRA

Limited support for HunyuanVideo LoRAs.
Effects are limited unless the LoRA is trained on FramePack itself.
Uses standard syntax: <lora:filename:weight>

Note

There is no networks panel available in FramePack so you have to add LoRA to prompt manually

Offloading and Quantization

Implementation replaces lllyasviel offloading with SD.Next Balanced offloading.
Balanced offload uses more resources, but on most GPUs it is significantly faster, especially with quantization.

Adds on-the-fly quantization support for LLM and DiT/Video modules.
Available only with native offloading; configure it in Settings -> Quantization.

Tip

It is recommended to enable quantization for Model, TE, and LLM modules.

See docs for more details on offloading and quantization.

API

Extension supports API calls at /sdapi/v1/framepack.
The only required parameters are base64-encoded init image and prompt; all others are optional.
After video generation, download it using endpoint /file={path-to-file}.
For example, see create-video.py

python create-video.py --help

Custom Model

You can get the current recipe to see which modules are loaded and adjust them if needed.
For example, changing the original llama to a different one can be done with:

text_encoder: Kijai/llava-llama-3-8b-text-encoder-tokenizer/
tokenizer: Kijai/llava-llama-3-8b-text-encoder-tokenizer/