OpenVINO

OpenVINO is an open-source toolkit for optimizing and deploying deep learning models.
Compiles models for your hardware.
Supports Linux and Windows
Supports CPU / GPU / iGPU / NPU
Supports AMD GPUs on Windows with FP16 support.
Supports INTEL dGPUs and iGPUs.
Supports NVIDIA GPUs.
Supports CPUs with BF16 and INT8 support.
Supports multiple devices at the same time using Hetero Device.

It is basically a TensorRT / Olive competitor that works with any hardware.

Installation

Preparations

Install the drivers for your device
Install Git and Python

Note

Do not mix OpenVINO with your old install. Treat OpenVINO as a seperate backend.

Running SD.Next with OpenVINO

Open CMD / Terminal in a folder you want to install SD.Next and install SD.Next from Github with this command:

git clone https://github.com/vladmandic/sdnext

Then enter into the sdnext folder:

cd sdnext

Then start WebUI with this command:

Windows:

.\webui.bat --use-openvino

Linux:

./webui.sh --use-openvino

Note

It will install the necessary libraries at the first run so it will take a while depending on your internet.

Running SD.Next with Docker

Checkout the Docker wiki if you want to build a custom Docker image.

Using Docker with a prebuilt image:

export SDNEXT_DOCKER_ROOT_FOLDER=~/sdnext
sudo docker run -it \
  --name sdnext-openvino \
  --device /dev/dri \
  -p 7860:7860 \
  -v $SDNEXT_DOCKER_ROOT_FOLDER/app:/app \
  -v $SDNEXT_DOCKER_ROOT_FOLDER/python:/mnt/python \
  -v $SDNEXT_DOCKER_ROOT_FOLDER/data:/mnt/data \
  -v $SDNEXT_DOCKER_ROOT_FOLDER/models:/mnt/models \
  -v $SDNEXT_DOCKER_ROOT_FOLDER/huggingface:/root/.cache/huggingface \
  disty0/sdnext-openvino:latest

Note

It will install the necessary libraries at the first run so it will take a while depending on your internet.
Resulting docker image will use 1.1 GB disk space (uncompressed) for the docker image and 2.5 GB for the venv.

More Info

Limitations

Same limitations with TensorRT / Olive applies here too.
Compilation takes a few minutes and using LoRas will trigger recompilation.
Attention Slicing and HyperTile will not work.

Quantization

Quantization enables 8 bit support without autocast.
Enable OpenVINO Quantize Models with NNCF option in Compute Settings to use it.

Note

Quantization has noticeable quality impact and generally not recommended.

Model Compression

Enable Compress Model weights with NNCF option in Compute Settings to use it.
Select a 4 bit mode from OpenVINO compress mode for NNCF to use 4 bit.
For GPUs; select both CPU and GPU from the device selection if you want to use GPU with Model Compression.

Note

VAE will still be compressed to INT8 if you use a 4 bit mode.

Custom Devices

Use the OpenVINO devices to use option in Compute Settings if you want to specify a device.
Selecting multiple devices will use multiple devices as a single HETERO device.

Using --device-id cli argument with the WebUI will use a GPU with the specified Device ID.
Using --use-cpu openvino cli argument with the WebUI will use the CPU.

If no device is specified, then the default device that will be used with OpenVINO will be auto detected with the following priority:
GPU -> GPU.1 -> GPU.0 -> Last device in the OpenVINO available devices list (CPU)

Model Caching

OpenVINO can save compiled models to cache folder so you won't have to compile them again.
OpenVINO disable model caching option in Compute Settings will disable caching. Disable this option to enable caching.
Directory for OpenVINO cache option in System Paths will set a new location for saving OpenVINO caches.