Scripts

Quick links

Script List

X/Y/Z Grid

The X/Y/Z Grid script generates multiple images with automatic parameter changes and displays results in labeled grids.

To enable it, scroll to the Script dropdown and select "X/Y/Z Grid".

X, Y, and Z types specify what to change:

X type: Creates columns
Y type: Creates rows
Z type: Creates separate grid images (emulating a "3D grid")

For some types, use a dropdown to select values. For others, enter comma-separated values.

Prompt S/R

"Prompt S/R" is Prompt Search and Replace. After selecting this type, enter the word to search for (which must be in your prompt) followed by comma-separated replacement words.

Example: If generating an image with the prompt "a lazy cat" and you set Prompt S/R to cat,dog,monkey, the script creates 3 images: - a lazy cat - a lazy dog - a lazy monkey

You can use multiple words or entire prompts: lazy cat,boisterous dog,mischievous monkey or a lazy cat,three blind mice,an astronaut on the moon.

Embeddings and LoRAs are also valid search and replace terms: <lora:FirstLora:1>,<lora:SecondLora:1>,<lora:ThirdLora:1>.

You can also change LoRA strength: <lora:FirstLora:1>,<lora:FirstLora:0.75>,<lora:FirstLora:0.5>,<lora:FirstLora:0.25> (can be shortened to FirstLora:1,FirstLora:0.75,FirstLora:0.5,FirstLora:0.25).

Face Script

SD.Next supports 4 face scripts:

FaceID

Select the desired FaceID model and upload a clear picture of the desired face.

Strength: How much the script is applied to the image
Structure: How much similarity between the uploaded and generated image

FaceSwap

Upload a clear picture of the desired face.

InstantID

Add an input image with a clear picture of the desired face.

Strength: How much the script is applied to the image
Control: How much similarity between the uploaded and generated image

PhotoMaker

Add an input image with a clear picture of the desired face.

Strength: How much the script is applied to the image
Start: When the script should be activated during image generation

Kohya HiRes Fix

The Kohya HiRes Fix generates higher-resolution images without deformities. It requires experimentation to find optimal settings for your use case.

Select the script and adjust settings as needed. Common parameters:

Scale Factor: Determines the scaling magnitude applied to input data
Timestep: Represents the time step used in processing; determines processing granularity
Block: Represents the number of blocks used; determines data partitioning into smaller segments

LayerDiffuse

LayerDiffuse creates transparent images with Diffusers. Select LayerDiffuse in the scripts and click "Apply to Model" after configuring. To disable it, uncheck the script.

Note

Reload the model and reapply after making changes like adding LoRA, ControlNet, or IP Adapters.

Mixture Tiling

Mixture of Diffusers allows detailed control over composition by harmonizing multiple diffusion processes on different canvas regions. This enables generating larger images where each object and style is controlled by a separate process.

To use it, select the script and enter prompts separated by newlines:

bird
plane
dog
cat

Set X and Y so that X × Y equals the number of prompts. For the example above, use X=2 and Y=2.

Overlap: Set overlap regions to 0 for a combined grid, or adjust to allow images to blend smoothly.

MuLan

MuLan equips diffusion models with multilingual generation in 110+ languages. Simply enable it in the scripts and prompt in your desired language.

Prompt Matrix

Prompt Matrix generates a grid of images to test and compare different prompt components. Enable it and create your prompt like this:

Woman|Red hair|Blue eyes

Set at Prompt Start: Reorder so secondary prompts come before the primary
Random Seeds: Use a different seed for each image
Prompt Type: Select which prompt type to apply this to
Joining Char: Choose separator (comma or space)
Grid Margins: Space between images

Prompt from File

Load generation settings and prompts from a file. Create a .txt file with settings like:

--prompt "whatever you want" --negative_prompt "whatever you don't want" --steps 30 --cfg_scale 10 --sampler_name "DPM++ SDE Karras" --seed -1 --width 512 --height 768

Then upload the file to SD.Next in the prompt upload section. You can also type settings directly in the prompts box for the same result (though changes won't be saved after shutdown).

Regional Prompting

Regional Prompting divides the canvas into multiple regions, each with separate prompts. Regions can be specified as a grid or calculated from prompts.

Cols and Rows

Split the screen vertically and horizontally, assigning a prompt to each region. The split ratio is specified by 'div' (e.g. '3;3;2' or '0.1;0.5'). You can also subdivide regions for more complex layouts.

Example:

Mode: rows
Prompt: green hair twintail BREAK
        red blouse BREAK
        blue skirt
Grid sections: 1,1,1

Advanced example:

Mode: rows
Prompt: blue sky BREAK
        green hair BREAK
        book shelf BREAK
        terrarium on the desk BREAK
        orange dress and sofa
Grid sections: 1,2,1,1;2,4,6

Prompt and Prompt-EX

In Prompt mode, duplicate regions are added. In Prompt-EX mode, duplicate regions are overwritten sequentially. Process regions in order; set larger regions first for better effect preservation in small regions.

Prompt-EX example:

Mode: Prompt-EX
Prompt: a girl in street with shirt, tie, skirt BREAK
        red, shirt BREAK
        green, tie BREAK
        blue, skirt
Prompt thresholds: 0.4,0.6,0.6

Threshold

The threshold determines the mask created by the prompt. Set one threshold per mask (separated by commas). Values vary widely depending on the target: hair requires small values (ambiguous), face requires larger values. Order thresholds by BREAK order.

Power

How much regional prompting is applied to image generation.

ResAdapter

ResAdapter is a resolution adapter enabling any diffusion model to generate resolution-free images without additional training or inference overhead.

Models	Parameters	Resolution Range	Ratio Range
resadapter_v2_sd1.5	0.9M	128 <= x <= 1024	0.28 <= r <= 3.5
resadapter_v2_sdxl	0.5M	256 <= x <= 1536	0.28 <= r <= 3.5
resadapter_v1_sd1.5	0.9M	128 <= x <= 1024	0.5 <= r <= 2
resadapter_v1_sd1.5_extrapolation	0.9M	512 <= x <= 1024	0.5 <= r <= 2
resadapter_v1_sd1.5_interpolation	0.9M	128 <= x <= 512	0.5 <= r <= 2
resadapter_v1_sdxl	0.5M	256 <= x <= 1536	0.5 <= r <= 2
resadapter_v1_sdxl_extrapolation	0.5M	1024 <= x <= 1536	0.5 <= r <= 2
resadapter_v1_sdxl_interpolation	0.5M	256 <= x <= 1024	0.5 <= r <= 2

Weight

How much ResAdapter should be applied to the image generation.

T-Gate

T-Gate efficiently generates images by caching and reusing attention outputs at scheduled time steps. Experiments show T-Gate’s broad applicability to various existing text-conditional diffusion models which it speeds up by 10-50%.

Simply enable T-Gate in the scripts, experiment with the steps a bit to see what works best for your needs.

Text-to-Video

Text-to-Video is a built-in script that makes animated art much easier to create. It offers multiple models, and the best choice depends on your configuration and personal preference.

First choose the script under the scripts, then choose the desired amount of frames, then like you would do normally fill in your positive prompt, negative prompts and etc., then choose the desired output format and click generate.

DemoFusion

DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration. You can find more information in the DemoFusion project documentation.

Denoising batch size: The batch size for multiple denoising paths. Typically, a larger batch size can result in higher efficiency but comes with increased GPU memory requirements.
Stride: The stride of moving local patches. A smaller stride is better for alleviating seam issues, but it also introduces additional computational overhead and inference time.
Cosine_scale_1: Control the decreasing rate of skip-residual. A smaller value results in better consistency with low-resolution results, but it may lead to more pronounced upsampling noise. Please refer to Appendix C in the DemoFusion paper.
Cosine_scale_2: Control the decreasing rate of dilated sampling. A smaller value can better address the repetition issue, but it may lead to grainy images. For specific impacts, please refer to Appendix C in the DemoFusion paper.
Cosine_scale_3: Control the decrease rate of the Gaussian filter. A smaller value results in less grainy images, but it may lead to over-smoothing images. Please refer to Appendix C in the DemoFusion paper.
Sigma: The standard value of the Gaussian filter. A larger sigma promotes the global guidance of dilated sampling, but it has the potential of over-smoothing.
Multi_decoder: Determine whether to use a tiled decoder. Generally, a tiled decoder becomes necessary when the resolution exceeds 3072*3072 on an RTX 3090 GPU.