Nunchaku
Nunchaku is a high-performance inference engine from MIT-Han-Lab optimized for 4-bit neural networks Nunchaku uses novel 4-bit SVDQuant quantization via DeepCompressor
Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!
Important
Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Note
On Blackweel GPUs, Nunchaku will default to FP4
methods
while on other GPU generations it will use INT4
methods
Install
SD.Next will attempt to auto-install pre-built wheels when possible,
but if you encounter issues, see Manual build section
Configure
To enable Nunchaku support, set appropriate quantization options in Settings -> Quantization
- Enable for modules:
Enable or disable Nunchaku for specific modules
Currently supports Transformers and TE
- Nunchaku attention:
Overrides current attention module with Nunchaku's custom fp16 attention mechanism
- Nunchaku offloading:
Overrides current offloading method with Nunchaku's custom offloading method
Support
At the moment, Nunchaku supports following models:
- FLUX.1 both Dev and Schnell variants
- SANA 1.0-1600M variant
- T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models
Important
SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces model's DiT module with a custom pre-quantized one,
Any model fine-tune will be ignored
Note
For FLUX.1 Nunchaku supports multiple base modeles:
- Black-Forrest FLUX.1 Dev and Schnell
- Shuttle Jaguar finetune
Notes
Warning
Nunchaku is EXPERIMENTAL and many normal features are not supported yet
Nunchaku is compatible with some advanced features like:
- LoRA loading
however, Nunchaku uses custom LoRA loader so not all LoRAs may be supported
- Para-attention first-block-cache
enable in Settings -> Pipeline modifiers
Unsupported and/or known limitations:
- Batch size
- Model unload causes memory leaks
Manual build
Install CUDA
Warning
Requires CUDA
dev installation with NVCC
URL: https://developer.nvidia.com/cuda-12-6-3-download-archive
Install docs
URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source
Quick-steps
Note
Build process will take a while, so be patient
cd sdnext
source venv/bin/activate
cd ..
git clone https://git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6
python
>>> import sys >>> import platform >>> import torch >>> sys.version_info sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0) >>> platform.system() 'Linux' >>> torch.__version__ '2.6.0+cu126' >>> torch.version.cuda '12.6' >>> torch.cuda.get_device_name(0) 'NVIDIA GeForce RTX 4090' >>> import nunchaku >>> nunchaku.__path__ ['/home/vlado/dev/nunchaku/nunchaku']