Scripts
Quick links
- X/Y/Z Grid
- Face script
- Kohya Hires Fix
- Layer diffuse
- Mixture tiling
- MuLan
- Prompt Matrix
- Prompt from file
- Regional prompting
- ResAdapter
- T-Gate
- Text-to-Video
- DemoFusion
Script List
X/Y/Z Grid
The X/Y/Z Grid script generates multiple images with automatic parameter changes and displays results in labeled grids.
To enable it, scroll to the Script dropdown and select "X/Y/Z Grid".
X, Y, and Z types specify what to change:
- X type: Creates columns
- Y type: Creates rows
- Z type: Creates separate grid images (emulating a "3D grid")
For some types, use a dropdown to select values. For others, enter comma-separated values.
Prompt S/R
"Prompt S/R" is Prompt Search and Replace. After selecting this type, enter the word to search for (which must be in your prompt) followed by comma-separated replacement words.
Example: If generating an image with the prompt "a lazy cat" and you set Prompt S/R to cat,dog,monkey, the script creates 3 images:
- a lazy cat
- a lazy dog
- a lazy monkey
You can use multiple words or entire prompts: lazy cat,boisterous dog,mischievous monkey or a lazy cat,three blind mice,an astronaut on the moon.
Embeddings and LoRAs are also valid search and replace terms: <lora:FirstLora:1>,<lora:SecondLora:1>,<lora:ThirdLora:1>.
You can also change LoRA strength: <lora:FirstLora:1>,<lora:FirstLora:0.75>,<lora:FirstLora:0.5>,<lora:FirstLora:0.25> (can be shortened to FirstLora:1,FirstLora:0.75,FirstLora:0.5,FirstLora:0.25).
Face Script
SD.Next supports 4 face scripts:
FaceID
Select the desired FaceID model and upload a clear picture of the desired face.
- Strength: How much the script is applied to the image
- Structure: How much similarity between the uploaded and generated image
FaceSwap
Upload a clear picture of the desired face.
InstantID
Add an input image with a clear picture of the desired face.
- Strength: How much the script is applied to the image
- Control: How much similarity between the uploaded and generated image
PhotoMaker
Add an input image with a clear picture of the desired face.
- Strength: How much the script is applied to the image
- Start: When the script should be activated during image generation
Kohya HiRes Fix
The Kohya HiRes Fix generates higher-resolution images without deformities. It requires experimentation to find optimal settings for your use case.
Select the script and adjust settings as needed. Common parameters:
- Scale Factor: Determines the scaling magnitude applied to input data
- Timestep: Represents the time step used in processing; determines processing granularity
- Block: Represents the number of blocks used; determines data partitioning into smaller segments
LayerDiffuse
LayerDiffuse creates transparent images with Diffusers. Select LayerDiffuse in the scripts and click "Apply to Model" after configuring. To disable it, uncheck the script.
Note
Reload the model and reapply after making changes like adding LoRA, ControlNet, or IP Adapters.
Mixture Tiling
Mixture of Diffusers allows detailed control over composition by harmonizing multiple diffusion processes on different canvas regions. This enables generating larger images where each object and style is controlled by a separate process.
To use it, select the script and enter prompts separated by newlines:
bird
plane
dog
cat
Set X and Y so that X × Y equals the number of prompts. For the example above, use X=2 and Y=2.
Overlap: Set overlap regions to 0 for a combined grid, or adjust to allow images to blend smoothly.
MuLan
MuLan equips diffusion models with multilingual generation in 110+ languages. Simply enable it in the scripts and prompt in your desired language.
Prompt Matrix
Prompt Matrix generates a grid of images to test and compare different prompt components. Enable it and create your prompt like this:
Woman|Red hair|Blue eyes
- Set at Prompt Start: Reorder so secondary prompts come before the primary
- Random Seeds: Use a different seed for each image
- Prompt Type: Select which prompt type to apply this to
- Joining Char: Choose separator (comma or space)
- Grid Margins: Space between images
Prompt from File
Load generation settings and prompts from a file. Create a .txt file with settings like:
--prompt "whatever you want" --negative_prompt "whatever you don't want" --steps 30 --cfg_scale 10 --sampler_name "DPM++ SDE Karras" --seed -1 --width 512 --height 768
Then upload the file to SD.Next in the prompt upload section. You can also type settings directly in the prompts box for the same result (though changes won't be saved after shutdown).
Regional Prompting
Regional Prompting divides the canvas into multiple regions, each with separate prompts. Regions can be specified as a grid or calculated from prompts.
Cols and Rows
Split the screen vertically and horizontally, assigning a prompt to each region. The split ratio is specified by 'div' (e.g. '3;3;2' or '0.1;0.5'). You can also subdivide regions for more complex layouts.
Example:
Mode: rows
Prompt: green hair twintail BREAK
red blouse BREAK
blue skirt
Grid sections: 1,1,1
Advanced example:
Mode: rows
Prompt: blue sky BREAK
green hair BREAK
book shelf BREAK
terrarium on the desk BREAK
orange dress and sofa
Grid sections: 1,2,1,1;2,4,6
Prompt and Prompt-EX
In Prompt mode, duplicate regions are added. In Prompt-EX mode, duplicate regions are overwritten sequentially. Process regions in order; set larger regions first for better effect preservation in small regions.
Prompt-EX example:
Mode: Prompt-EX
Prompt: a girl in street with shirt, tie, skirt BREAK
red, shirt BREAK
green, tie BREAK
blue, skirt
Prompt thresholds: 0.4,0.6,0.6
Threshold
The threshold determines the mask created by the prompt. Set one threshold per mask (separated by commas). Values vary widely depending on the target: hair requires small values (ambiguous), face requires larger values. Order thresholds by BREAK order.
Power
How much regional prompting is applied to image generation.
ResAdapter
ResAdapter is a resolution adapter enabling any diffusion model to generate resolution-free images without additional training or inference overhead.
| Models | Parameters | Resolution Range | Ratio Range |
|---|---|---|---|
| resadapter_v2_sd1.5 | 0.9M | 128 <= x <= 1024 | 0.28 <= r <= 3.5 |
| resadapter_v2_sdxl | 0.5M | 256 <= x <= 1536 | 0.28 <= r <= 3.5 |
| resadapter_v1_sd1.5 | 0.9M | 128 <= x <= 1024 | 0.5 <= r <= 2 |
| resadapter_v1_sd1.5_extrapolation | 0.9M | 512 <= x <= 1024 | 0.5 <= r <= 2 |
| resadapter_v1_sd1.5_interpolation | 0.9M | 128 <= x <= 512 | 0.5 <= r <= 2 |
| resadapter_v1_sdxl | 0.5M | 256 <= x <= 1536 | 0.5 <= r <= 2 |
| resadapter_v1_sdxl_extrapolation | 0.5M | 1024 <= x <= 1536 | 0.5 <= r <= 2 |
| resadapter_v1_sdxl_interpolation | 0.5M | 256 <= x <= 1024 | 0.5 <= r <= 2 |
Weight
How much ResAdapter should be applied to the image generation.
T-Gate
T-Gate efficiently generates images by caching and reusing attention outputs at scheduled time steps. Experiments show T-Gate’s broad applicability to various existing text-conditional diffusion models which it speeds up by 10-50%.
Simply enable T-Gate in the scripts, experiment with the steps a bit to see what works best for your needs.
Text-to-Video
Text-to-Video is a built-in script that makes animated art much easier to create. It offers multiple models, and the best choice depends on your configuration and personal preference.
First choose the script under the scripts, then choose the desired amount of frames, then like you would do normally fill in your positive prompt, negative prompts and etc., then choose the desired output format and click generate.
DemoFusion
DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration. You can find more information in the DemoFusion project documentation.
- Denoising batch size: The batch size for multiple denoising paths. Typically, a larger batch size can result in higher efficiency but comes with increased GPU memory requirements.
- Stride: The stride of moving local patches. A smaller stride is better for alleviating seam issues, but it also introduces additional computational overhead and inference time.
- Cosine_scale_1: Control the decreasing rate of skip-residual. A smaller value results in better consistency with low-resolution results, but it may lead to more pronounced upsampling noise. Please refer to Appendix C in the DemoFusion paper.
- Cosine_scale_2: Control the decreasing rate of dilated sampling. A smaller value can better address the repetition issue, but it may lead to grainy images. For specific impacts, please refer to Appendix C in the DemoFusion paper.
- Cosine_scale_3: Control the decrease rate of the Gaussian filter. A smaller value results in less grainy images, but it may lead to over-smoothing images. Please refer to Appendix C in the DemoFusion paper.
- Sigma: The standard value of the Gaussian filter. A larger sigma promotes the global guidance of dilated sampling, but it has the potential of over-smoothing.
- Multi_decoder: Determine whether to use a tiled decoder. Generally, a tiled decoder becomes necessary when the resolution exceeds 3072*3072 on an RTX 3090 GPU.