Nunchaku
Nunchaku is a high-performance inference engine from MIT-Han-Lab, optimized for 4-bit neural networks. It uses 4-bit SVDQuant quantization via DeepCompressor.
Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!
Important
Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Nunchaku requires Python 3.11 or 3.12
[!NOTE]
On Blackwell GPUs, Nunchaku defaults to FP4 methods.
On other GPU generations, it uses INT4 methods.
Install
SD.Next attempts to auto-install prebuilt wheels when possible. If that fails, use the Manual build section.
Configure
To enable Nunchaku support, set quantization options in Settings -> Quantization:
- Enable for modules: Enable or disable Nunchaku for specific modules. Currently supports Transformers and TE.
- Nunchaku attention: Replaces the current attention module with Nunchaku's custom FP16 attention.
- Nunchaku offloading: Replaces the current offloading method with Nunchaku's custom offloading.
Support
At the moment, Nunchaku supports following models:
- FLUX.1 both Dev and Schnell variants
- SANA 1.0-1600M variant
- Qwen Image original and lightning variants
- T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models
Important
SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces a model's DiT module with a custom pre-quantized module.
Any model fine-tune is ignored.
[!NOTE]
For FLUX.1, Nunchaku supports multiple base models:
- Black-Forrest FLUX.1 Dev and Schnell
- Shuttle Jaguar finetune
Notes
Warning
Nunchaku is EXPERIMENTAL and many normal features are not supported yet
Nunchaku is compatible with some advanced features like:
- LoRA loading
however, Nunchaku uses a custom LoRA loader, so not all LoRAs may be supported
- Para-attention first-block-cache
enable in Settings -> Pipeline modifiers
Unsupported and/or known limitations:
- Batch size
- Model unload causes memory leaks
Manual build
Install CUDA
Warning
Requires CUDA dev installation with NVCC
URL: https://developer.nvidia.com/cuda-12-6-3-download-archive
Install docs
URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source
Quick-steps
Note
Build process will take a while, so be patient
cd sdnext
source venv/bin/activate
cd ..
git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6
python
>>> import sys
>>> import platform
>>> import torch
>>> sys.version_info
sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0)
>>> platform.system()
'Linux'
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'
>>> import nunchaku
>>> nunchaku.__path__
['/home/vlado/dev/nunchaku/nunchaku']