Skip to content

Nunchaku

Nunchaku is a high-performance inference engine from MIT-Han-Lab, optimized for 4-bit neural networks. It uses 4-bit SVDQuant quantization via DeepCompressor.

Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!

Important

Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Nunchaku requires Python 3.11 or 3.12

[!NOTE] On Blackwell GPUs, Nunchaku defaults to FP4 methods.
On other GPU generations, it uses INT4 methods.

Install

SD.Next attempts to auto-install prebuilt wheels when possible. If that fails, use the Manual build section.

Configure

To enable Nunchaku support, set quantization options in Settings -> Quantization:

  • Enable for modules: Enable or disable Nunchaku for specific modules. Currently supports Transformers and TE.
  • Nunchaku attention: Replaces the current attention module with Nunchaku's custom FP16 attention.
  • Nunchaku offloading: Replaces the current offloading method with Nunchaku's custom offloading.

Support

At the moment, Nunchaku supports following models:
- FLUX.1 both Dev and Schnell variants
- SANA 1.0-1600M variant
- Qwen Image original and lightning variants
- T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models

Important

SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces a model's DiT module with a custom pre-quantized module.
Any model fine-tune is ignored.

[!NOTE] For FLUX.1, Nunchaku supports multiple base models: - Black-Forrest FLUX.1 Dev and Schnell
- Shuttle Jaguar finetune

Notes

Warning

Nunchaku is EXPERIMENTAL and many normal features are not supported yet

Nunchaku is compatible with some advanced features like: - LoRA loading
however, Nunchaku uses a custom LoRA loader, so not all LoRAs may be supported
- Para-attention first-block-cache
enable in Settings -> Pipeline modifiers

Unsupported and/or known limitations: - Batch size
- Model unload causes memory leaks

Manual build

Install CUDA

Warning

Requires CUDA dev installation with NVCC

URL: https://developer.nvidia.com/cuda-12-6-3-download-archive

Install docs

URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source

Quick-steps

Note

Build process will take a while, so be patient

cd sdnext
source venv/bin/activate
cd ..
git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6

python

>>> import sys
>>> import platform
>>> import torch
>>> sys.version_info
sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0)
>>> platform.system()
'Linux'
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'
>>> import nunchaku
>>> nunchaku.__path__
['/home/vlado/dev/nunchaku/nunchaku']