This article was originally published on Medium*.
Figure 1. A stable diffusion image result of an image of a waterfall plus a text prompt of Mars waterfall, run on Intel® Data Center GPU Max Series 1100.
Oct. 15, 2023 — Stable Diffusion* models have become a great way for creators, artists, and designers to quickly prototype visual ideas without the need for hiring outside help. If you have ever used a stable diffusion model, you might be familiar with giving a text prompt to generate an image. There are also models that allow for both a text prompt and an image as a starting point to generate an image. In this article, I show how I ran prediction of image-to-image Stable Diffusion models on Intel’s just-released Intel® Data Center GPU Max Series 1100.
I ran two different Stable Diffusion models for image-to-image generation, hosted on Hugging Face*. Though both models are used primarily for text-to-image, they both work on image-to-image as well:
Stability AI with Stable Diffusion v2–1 Model
The Stability AI with Stable Diffusion v2–1 model was trained on an impressive cluster of 32 x 8 x A100 GPUs (256 GPU cards total). It was fine-tuned from a Stable Diffusion v2 model. The original dataset was a subset of the LAION-5B dataset, created by the DeepFloyd team at Stability AI. The LAION-5B dataset is the largest text-image pair dataset to date as of the time of writing, with over 5.85 billion text-image pairs. Figure 2 shows a few samples from the dataset.
Figure 2. Samples from the LAION-5B dataset of examples of of cats.
Image Source
The sample images shown reveal that the original images do come in a variety of pixel sizes; however, training these models in practice usually involves padding or resizing of the images to have a consistent pixel size for the model architecture.
The breakdown of the dataset is as follows:
- Laion2B-en: 2.32 billion text-image pairs in English
- Laion2B-multi: 2.26 billion text-image pairs from over 100 other languages
- Laion1B-nolang: 1.27 billion text-image pairs with an undetectable language
Tracing the path of training these models is a bit convoluted, but here is the full story:
More details on the training can be found on the Stability AI Stable Diffusion v2–1 Hugging Face model card. I wanted to mention that I have repeated this description from a previous article on text-to-image stable diffusion, as it is the same model.
Runway ML with Stable Diffusion v1–5 Model
The Runway ML model was actually fine-tuned from the previously described Stability AI v2–1 model. It was trained for an additional 595 K steps at a resolution of 512 x 512. One of its advantages is that it is relatively lightweight: "With its 860 M UNet and 123 M text encoder, the model is relatively lightweight and runs on a GPU with at least 10 GB VRAM." (source on GitHub). The Intel Data Center GPU Max Series 1100 has 48 GB of VRAM, so it is plenty for this model.
The Intel GPU Hardware
As I just mentioned, the particular GPU that I used for my inference test is the Intel Data Center GPU Max 1100, which has 48 GB of memory, 56 Xe-cores, and 300 W of thermal design power. On the command line, I can first verify that I indeed do have the GPUs that I expect by running:
clinfo -l
And I get an output showing that I have access to four Intel GPUs on the current node:
Platform #0: Intel(R) OpenCL Graphics
+-- Device #0: Intel(R) Data Center GPU Max 1100
+-- Device #1: Intel(R) Data Center GPU Max 1100
+-- Device #2: Intel(R) Data Center GPU Max 1100
`-- Device #3: Intel(R) Data Center GPU Max 1100
Similar to the nvidia-smi
function, you can run the xpu-smi
in the command line with a few options selected to get the statistics you want on GPU use.
xpu-smi dump -d 0 -m 0,5,18
The result is a printout every 1 s of important GPU use for the device 0:
getpwuid error: Success
Timestamp, DeviceId, GPU Utilization (%), GPU Memory Utilization (%), GPU Memory Used (MiB)
13:34:51.000, 0, 0.02, 0.05, 28.75
13:34:52.000, 0, 0.00, 0.05, 28.75
13:34:53.000, 0, 0.00, 0.05, 28.75
13:34:54.000, 0, 0.00, 0.05, 28.75
Run the Stable Diffusion Image-To-Image Examples
My colleague, Rahul Nair, wrote the Stable Diffusion image-to-image Jupyter* Notebook that is hosted directly on the Intel® Developer Cloud. It gives you the option of using either model that I outlined earlier. Here are the steps you can take to get started:
- Go to Intel Developer Cloud.
- Register as a standard user.
- Once you are logged in, go to the Training and Workshops section.
- Select GenAI Launch Jupyter Notebook. You can find the text-to-image Stable Diffusion image-to-image Jupyter Notebook and run it there.
In the Jupyter Notebook, to speed up inference, Intel® Extension for PyTorch* was used. One of the key functions is _optimize_pipeline
where ipex.optimize
is called to optimize the DiffusionPipeline
object.
def _optimize_pipeline(
self, pipeline: StableDiffusionImg2ImgPipeline
) -> StableDiffusionImg2ImgPipeline:
"""
Optimize the pipeline of the model.
Args:
pipeline (StableDiffusionImg2ImgPipeline): The pipeline to optimize.
Returns:
StableDiffusionImg2ImgPipeline: The optimized pipeline.
"""
for attr in dir(pipeline):
if isinstance(getattr(pipeline, attr), nn.Module):
setattr(
pipeline,
attr,
ipex.optimize(
getattr(pipeline, attr).eval(),
dtype=pipeline.text_encoder.dtype,
inplace=True,
),
)
return pipeline
Figure 3 shows the handy mini user interface within the Jupyter Notebook itself for the image-to-image generation. Select one of the models, enter the desired image URL, enter a prompt, select the number of images to generate, and you’re off to creating your own images.
Figure 3. A mini user interface for the image-to-image interface within the Jupyter Notebook.
Figures 1 and 4 show samples of the results with entirely new images from text + image prompts that I ran with this Intel GPU. I thought it was neat to start with a real Earth nature photo of a waterfall, tell it to make a Mars waterfall, and see the adaptation to a red-colored landscape (Figure 1). And then in Figure 4, the model transformed an image of Jupiter to have some Earth continental structure but still colored red with some of the distinctive features of Jupiter left over.
Figure 4. A Stable Diffusion image result of an image of the planet Jupiter plus a text prompt of Earth run on the latest Intel Data Center GPU Max Series 1100.
I was able to generate these images by running through the Jupyter Notebook, and inference runs in a matter of seconds. Feel free to share your images with me over social media by connecting with me through the following links. Also please let me know if you have any questions or would like help on getting started with trying out stable diffusion.
You can reach me on: