Nunchaku
Nunchaku is a high-performance inference engine from MIT-Han-Lab optimized for 4-bit neural networks Nunchaku uses novel 4-bit SVDQuant quantization via DeepCompressor
Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!
Important
Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Nunchaku requires Python 3.11 or 3.12
Note
On Blackweel GPUs, Nunchaku will default to FP4
methods
while on other GPU generations it will use INT4
methods
Install
SD.Next will attempt to auto-install pre-built wheels when possible,
but if you encounter issues, see Manual build section
Configure
To enable Nunchaku support, set appropriate quantization options in Settings -> Quantization
- Enable for modules:
Enable or disable Nunchaku for specific modules
Currently supports Transformers and TE
- Nunchaku attention:
Overrides current attention module with Nunchaku's custom fp16 attention mechanism
- Nunchaku offloading:
Overrides current offloading method with Nunchaku's custom offloading method
Support
At the moment, Nunchaku supports following models:
- FLUX.1 both Dev and Schnell variants
- SANA 1.0-1600M variant
- T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models
Important
SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces model's DiT module with a custom pre-quantized one,
Any model fine-tune will be ignored
Note
For FLUX.1 Nunchaku supports multiple base modeles:
- Black-Forrest FLUX.1 Dev and Schnell
- Shuttle Jaguar finetune
Notes
Warning
Nunchaku is EXPERIMENTAL and many normal features are not supported yet
Nunchaku is compatible with some advanced features like:
- LoRA loading
however, Nunchaku uses custom LoRA loader so not all LoRAs may be supported
- Para-attention first-block-cache
enable in Settings -> Pipeline modifiers
Unsupported and/or known limitations:
- Batch size
- Model unload causes memory leaks
Manual build
Install CUDA
Warning
Requires CUDA
dev installation with NVCC
URL: https://developer.nvidia.com/cuda-12-6-3-download-archive
Install docs
URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source
Quick-steps
Note
Build process will take a while, so be patient
cd sdnext
source venv/bin/activate
cd ..
git clone https://git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6
python
>>> import sys >>> import platform >>> import torch >>> sys.version_info sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0) >>> platform.system() 'Linux' >>> torch.__version__ '2.6.0+cu126' >>> torch.version.cuda '12.6' >>> torch.cuda.get_device_name(0) 'NVIDIA GeForce RTX 4090' >>> import nunchaku >>> nunchaku.__path__ ['/home/vlado/dev/nunchaku/nunchaku']