Skip to content

ZLUDA

ZLUDA is a CUDA CUDA Wrapper that allows to run applications using normally unsupported GPUS such as AMD GPUs in Windows

Warning

ZLUDA support is unofficial and support is limited at this time

Installing ZLUDA for AMD GPUs in Windows

Note

This guide assumes you have Git and Python installed,
and are comfortable using the command prompt, navigating Windows Explorer, renaming files and folders, and working with zip files.

Important

If you have an integrated AMD GPU (iGPU), you may need to disable it,
or use the HIP_VISIBLE_DEVICES environment variable.

Install Visual C++ Runtime

Note

Most everyone would have this anyway, since it comes with a lot of games, but there's no harm in trying to install it.

Grab the latest version of Visual C++ Runtime from https://aka.ms/vs/17/release/vc_redist.x64.exe (this is a direct download link) and then run it.
If you get the options to Repair or Uninstall, then you already have it installed and can click Close. Otherwise, install it.

Install ZLUDA

ZLUDA is now auto-installed, and automatically added to PATH, when starting webui.bat with --use-zluda.

Install HIP SDK

Install HIP SDK 6.2 from https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html
So long as your regular AMD GPU driver is up to date, you don't need to install the PRO driver HIP SDK suggests.

Replace HIP SDK library files for unsupported GPU architectures

Go to https://rocm.docs.amd.com/projects/install-on-windows/en/develop/reference/system-requirements.html and find your GPU model.
If your GPU model has a ✅ in both columns then skip to Install SD.Next.
If your GPU model has an ❌ in the HIP SDK column, or if your GPU isn't listed, follow the instructions below;

  1. Open Windows Explorer and copy and paste C:\Program Files\AMD\ROCm\6.2\bin\rocblas into the location bar.
    (Assuming you've installed the HIP SDK in the default location and Windows is located on C:)
  2. Make a copy of the library folder, for backup purposes.
  3. Download one of the unofficial rocBLAS library, and unzip them in the original library folder, overwriting any files there.
    gfx1010: RX 5700, RX 5700 XT
    gfx1012: RX 5500, RX 5500 XT
    gfx1031: RX 6700, RX 6700 XT, RX 6750 XT
    gfx1032: RX 6600, RX 6600 XT, RX 6650 XT
    gfx1103: Radeon 780M
    gfx803: RX 570, RX 580
    More...
  4. Open the zip file.
  5. Drag and drop the library folder from zip file into %HIP_PATH%bin\rocblas (The folder you opened in step 1).
  6. Reboot PC

If your GPU model not in the HIP SDK column or not available in the above list, follow the instructions in ROCm Support guide to build your own RocblasLibs.

Warning

Building your own libraries is not for the faint of heart

Install SD.Next

Using Windows Explorer, navigate to a place you'd like to install SD.Next. This should be a folder which your user account has read/write/execute access to. Installing SD.Next in a directory which requires admin permissions may cause it to not launch properly.

Note: Refrain from installing SD.Next into the Program Files, Users, or Windows folders, this includes the OneDrive folder or on the Desktop, or into a folder that begins with a period; (eg: .sdnext).

The best place would be on an SSD for model loading.

In the Location Bar, type cmd, then hit [Enter]. This will open a Command Prompt window at that location.

image

Copy and paste the following commands into the Command Prompt window, one at a time;

git clone https://github.com/vladmandic/sdnext
cd sdnext
.\webui.bat --use-zluda --debug --autolaunch

Compilation and First Generation

Now, try to generate something. This should take a fair while to compile (10-15mins, or even longer; some reports state over an hour), but this compilation should only need to be done once.
Note: The text Compilation is in progress. Please wait... will repeatedly appear, just be patient. Eventually your image will start generating.
Subsequent generations will be significantly quicker.

Upgrading ZLUDA

If you have problem with ZLUDA after updating SD.Next, upgrading ZLUDA may help.

  1. Remove .zluda folder.
  2. Launch WebUI. The installer will download and install newer ZLUDA.

※ You may have to wait for a while to compile as the first generation.

Experimental features

cuDNN

Speed-up: ★★★☆☆
VRAM: ★★★★☆
Stability: ★★★☆☆
Compatible with: Navi cards

MIOpen, the equivalent of cuDNN for AMDGPUs, hasn't been released on Windows yet.

However, you can enable it with a custom build of MIOpen.

This section describes how to enable cuDNN.

  1. Install HIP SDK 6.2. If you already have older HIP SDK, uninstall it before installing 6.2.
  2. Download and install HIP SDK extension from here.
    (unzip and paste folders upon path/to/AMD/ROCm/6.2)
  3. Remove .zluda folder if exists.
  4. Launch WebUI with command line arguments --use-zluda --use-nightly.

The first generation will take long time because MIOpen has to find the optimal solution and cache it.

If you get driver crashes, restart webui and try again.

cuBLASLt

Speed-up: ★☆☆☆☆
VRAM: ★☆☆☆☆
Stability: ★★☆☆☆
Compatible with: gfx1100, or CDNA accelerators

hipBLASLt, the equivalent of cuBLASLt for AMDGPUs, hasn't been released on Windows yet.

However, there're unofficial builds available.

This section describes how to enable cuBLASLt.

  1. Install HIP SDK 6.2. If you already have older HIP SDK, uninstall it before installing 6.2.
  2. Download and install HIP SDK extension from here.
    (unzip and paste folders upon path/to/AMD/ROCm/6.2)
  3. Remove .zluda folder if exists.
  4. Launch WebUI with command line arguments --use-zluda --use-nightly.

triton

Speed-up: ★★★★★
VRAM: ★★★★☆
Stability: ★★★★☆
Compatible with: Navi cards

  1. Prepare Python 3.11 (or 3.10) environment.
  2. Download triton wheel from here.
  3. Install via pip.

pip install --upgrade path/to/triton-3.3.0+gitbb314b47-cp311-cp311-win_amd64.whl setuptools
(cp311 should be replaced with cp310 if you have Python 3.10)

※ Developer PowerShell for Visual Studio (or Prompt) will be needed to compile kernel using triton.

Flash Attention 2

Using triton, you can enable Flash Attention 2.

  1. Go to Settings.
  2. Set attention method to Scaled Dot-product.
  3. Enable Flash attention.
  4. Restart WebUI.

torch.compile

Using triton, you can enable torch.compile.

  1. Go to Settings.
  2. Enable compilation.
  3. Set compilation method to inductor or cuda-graph.

torch.compile is currently not compatible with flash attention 2 on ZLUDA.


Comparison (DirectML)

DirectML ZLUDA
Speed Slower Faster
VRAM Usage More Less
VRAM GC
Traning *
Flash Attention
FFT ⚠️
DNN
RTC
Source Code Closed-source Open-source

❓: unknown
⚠️: partially supported
*: known as possible, but uses too much VRAM to train stable diffusion models/LoRAs/etc.

Compatibility

DTYPE
FP64
FP32
FP16
BF16
LONG
INT8
UINT8 ✅*
INT4
FP8 ⚠️
BF8 ⚠️

*: Not tested.