Control Overview
Native control module for SD.Next on the Diffusers backend.
It supports control-guided generation plus image and text workflows.
For option and setting details, see Control Howto and Control Settings.
Supported Control Models
- lllyasviel ControlNet for SD 1.5 and SD-XL models
Includes ControlNets, Reference-only mode, and compatible third-party models. Original SD 1.5 ControlNets are about 1.4 GB each, and SDXL models are about 4.9 GB each. - VisLearn ControlNet XS for SD-XL models
Lightweight SDXL ControlNet models at about 165 MB with near-identical results. - TencentARC T2I-Adapter for SD 1.5 and SD-XL models
Provides similar control functionality at lower resource cost, about 300 MB each. - Kohya Control LLite for SD-XL models
Lightweight image control for SDXL, about 46 MB each. - TenecentAILab IP-Adapter for SD 1.5 and SD-XL models
Style transfer with lower resource cost, below 100 MB for SD 1.5 and about 700 MB for SDXL. Can be combined with ControlNet for more stable batch or video workflows. - CiaraRowles TemporalNet for SD 1.5 models
Improves temporal consistency and reduces flicker in batch/video processing.
All built-in models are downloaded on first use and stored in:
/models/controlnet, /models/adapter, /models/xs, /models/lite, /models/processor
Listed below are all models that are supported out-of-the-box:
ControlNet
- SD15:
Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose,
Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile - SDXL:
Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL
Note: only models compatible with the currently loaded base model are listed.
Additional ControlNet models in safetensors can be downloaded manually and placed in: /models/control/controlnet
ControlNet XS
- SDXL:
Canny, Depth
ControlNet LLLite
- SDXL:
Canny, Canny anime, Depth anime, Blur anime, Pose anime, Replicate anime
Note: Control-LLLite uses an unofficial implementation and is considered experimental.
Additional ControlNet models in safetensors can be downloaded manually and placed in: /models/control/lite
T2I-Adapter
Built-in aliases include:
'Segment': 'TencentARC/t2iadapter_seg_sd14v1'
'Zoe Depth': 'TencentARC/t2iadapter_zoedepth_sd15v1'
'OpenPose': 'TencentARC/t2iadapter_openpose_sd14v1'
'KeyPose': 'TencentARC/t2iadapter_keypose_sd14v1'
'Color': 'TencentARC/t2iadapter_color_sd14v1'
'Depth v1': 'TencentARC/t2iadapter_depth_sd14v1'
'Depth v2': 'TencentARC/t2iadapter_depth_sd15v2'
'Canny v1': 'TencentARC/t2iadapter_canny_sd14v1'
'Canny v2': 'TencentARC/t2iadapter_canny_sd15v2'
'Sketch v1': 'TencentARC/t2iadapter_sketch_sd14v1'
'Sketch v2': 'TencentARC/t2iadapter_sketch_sd15v2'
- SD15:
Segment, Zoe Depth, OpenPose, KeyPose, Color, Depth v1, Depth v2, Canny v1, Canny v2, Sketch v1, Sketch v2 - SDXL:
Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL
Note: Only models compatible with the currently loaded base model are listed.
Processors
- Pose style: OpenPose, DWPose, MediaPipe Face
- Outline style: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet
- Depth style: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae
- Segmentation style: SegmentAnything
- Other: MLSD, Shuffle
Note: Processor sizes range from built-in options with no extra size up to about 4.2 GB for ZoeDepth-Large.
Segmentation Models
There are 8 auto-segmentation models available:
- Facebook SAM ViT Base (357MB)
- Facebook SAM ViT Large (1.16GB)
- Facebook SAM ViT Huge (2.56GB)
- SlimSAM Uniform (106MB)
- SlimSAM Uniform Tiny (37MB)
- Rembg Silueta
- Rembg U2Net
- Rembg ISNet
Reference
Reference mode uses its own pipeline, so it cannot use multiple units or processors.
Workflows
Inputs & Outputs
- Image -> Image
- Batch: list of images -> Gallery and/or Video
- Folder: folder with images -> Gallery and/or Video
- Video -> Gallery and/or Video
Notes:
- Input/Output/Preview panels can be minimized by clicking on them
- For video output, make sure to set video options
Unit
- Unit is: input plus process plus control
- Pipeline consists of any number of configured units
If a unit uses control modules, all control modules in that pipeline must be the same type. e.g. ControlNet, ControlNet-XS, T2I-Adapter or Reference - Each unit can use primary input or its own override input
- Each unit can have no processor, in which case control runs directly on input. Use this when using predefined input templates.
- Unit can have no control in which case it will run processor only
- Any combination of input, processor and control is possible
For example, two enabled units with processor only produce a compound processed image without control.
What-if?
- If no input is provided then pipeline will run in txt2img mode
Can be freely used instead of standardtxt2img - If none of units have control or adapter, pipeline will run in img2img mode using input image
Can be freely used instead of standardimg2img - If you have processor enabled, but no ControlNet or adapter loaded, pipeline will run in img2img mode using processed input
- If you have multiple processors enabled, but no ControlNet or adapter loaded, pipeline will run in img2img mode on blended processed image
- Output resolution is by default set to input resolution,
Use resize settings to force any resolution - Resize operation can run before (on input image) or after processing (on output image)
- Using video input will run pipeline on each frame unless skip frames is set
Video output is standard list of images (gallery) and can be optionally encoded into a video file
Video file can be interpolated using RIFE for smoother playback
Overrides
- Control can be based on main input or each individual unit can have its own override input
- By default, control runs in default control+txt2img mode
- If init image is provided, it runs in control+img2img mode
Init image can be same as control image or separate - IP adapter can be applied to any workflow
- IP adapter can use same input as control input or separate
Inpaint
- Inpaint workflow is triggered when input image is provided in inpaint mode
- Inpaint mode can be used with image-to-image or controlnet workflows
- Other unit types such as T2I, XS or Lite do not support inpaint mode
Outpaint
- Outpaint workflow is triggered when input image is provided in outpaint mode
- Outpaint mode can be used with image-to-image or ControlNet workflows
- Other unit types such as T2I, XS or Lite do not support outpaint mode
- Recommended denoising strength is at least 0.8 because outpainted area starts blank and needs noise.
- Influence from the original image can be controlled with overlap. Higher overlap includes more of the original image in outpaint processing.
Logging
To enable extra logging for troubleshooting, set environment variables before running SD.Next.
-
Linux:
export SD_CONTROL_DEBUG=true
export SD_PROCESS_DEBUG=true
./webui.sh --debug -
Windows:
set SD_CONTROL_DEBUG=true
set SD_PROCESS_DEBUG=true
webui.bat --debug
Note: Starting with debug enabled also enables Test mode in the Control module.
Known issues
DWPose
DWPose preprocessor internally uses openmin/mmengine/mmpose/mmdet packages.
These packages have not been updated in several years, so compatibility with newer torch and system packages can be limited.
You can try installing DWPose dependencies manually:
- Install full CUDA Toolkit as mmengine requires nvidia compiler (nvcc)
note: CUDA version should match the CUDA version bundled with torch shown in SD.Next logs:
Torch: torch==2.6.0+cu126 torchvision==0.21.0+cu126
herecu126means CUDA version 12.6. - Install build tools for your platform Linux:build-essentials/gcc/makeor Windows:Visual Studio Build Tools
- Activate yourvenvLinux:source venv/bin/activateor Windows:venv\Scripts\activate- Install requirements:pip install --upgrade --no-deps --force-reinstall termcolor xtcocotools terminaltables pycocotools munkres shapely openmim==0.3.9 mmengine==0.10.5 mmcv==2.2.0 mmpose==1.3.2 mmdet==3.3.0
Note that this can take a long time