Skip to content

Nunchaku

Nunchaku is a high-performance inference engine from MIT-Han-Lab optimized for 4-bit neural networks Nunchaku uses novel 4-bit SVDQuant quantization via DeepCompressor

Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!

Important

Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs

Note

On Blackweel GPUs, Nunchaku will default to FP4 methods
while on other GPU generations it will use INT4 methods

Install

SD.Next will attempt to auto-install pre-built wheels when possible,
but if you encounter issues, see Manual build section

Configure

To enable Nunchaku support, set appropriate quantization options in Settings -> Quantization
- Enable for modules:
Enable or disable Nunchaku for specific modules
Currently supports Transformers and TE
- Nunchaku attention:
Overrides current attention module with Nunchaku's custom fp16 attention mechanism
- Nunchaku offloading:
Overrides current offloading method with Nunchaku's custom offloading method

Support

At the moment, Nunchaku supports following models:
- FLUX.1 both Dev and Schnell variants
- SANA 1.0-1600M variant
- T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models

Important

SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces model's DiT module with a custom pre-quantized one,
Any model fine-tune will be ignored

Note

For FLUX.1 Nunchaku supports multiple base modeles: - Black-Forrest FLUX.1 Dev and Schnell
- Shuttle Jaguar finetune

Notes

Warning

Nunchaku is EXPERIMENTAL and many normal features are not supported yet

Nunchaku is compatible with some advanced features like: - LoRA loading
however, Nunchaku uses custom LoRA loader so not all LoRAs may be supported
- Para-attention first-block-cache
enable in Settings -> Pipeline modifiers

Unsupported and/or known limitations: - Batch size
- Model unload causes memory leaks

Manual build

Install CUDA

Warning

Requires CUDA dev installation with NVCC

URL: https://developer.nvidia.com/cuda-12-6-3-download-archive

Install docs

URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source

Quick-steps

Note

Build process will take a while, so be patient

cd sdnext
source venv/bin/activate
cd ..
git clone https://git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6

python

>>> import sys
>>> import platform
>>> import torch
>>> sys.version_info
sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0)
>>> platform.system()
'Linux'
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'
>>> import nunchaku
>>> nunchaku.__path__
['/home/vlado/dev/nunchaku/nunchaku']