3x Faster Stable Diffusion Models

This tutorial demonstrates how to use the pruna package to optimize any custom stable diffusion model. We will use the stable-diffusion-v1-4 model as an example. Any execution times given below are measured on a T4 GPU.

1. Loading the Stable Diffusion Model

First, load your stable diffusion model.

[ ]:

import torch
from diffusers import StableDiffusionPipeline

# Define the model ID
model_id = "CompVis/stable-diffusion-v1-4"

# Load the pre-trained model
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

2. Initializing the Smash Config

Next, initialize the smash_config.

[ ]:

from pruna import SmashConfig

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['compilers'] = ['diffusers2']
smash_config['cachers'] = ['step_caching']
smash_config['cache_step_caching_interval'] = 3 # higher is faster but less quality

3. Smashing the Model

Now, smash the model. This can take up to 2 minutes. Don’t forget to replace the token by the one provided by PrunaAI.

[ ]:

from pruna import smash

# Smash the model
smashed_model = smash(
    model=pipe,
    token='<your_token>',  # replace <your-token> with your actual token or set to None if you do not have one yet
    smash_config=smash_config,
)

4. Running the Model

After the model has been compiled, we run inference for a few iterations as warm-up. This will take around 50 seconds.

[ ]:

# Define the prompt
prompt = "a photo of an astronaut riding a horse on mars"

# run some warm-up iterations
for _ in range(5):
  smashed_model(prompt)

Finally, run the model to generate the image with accelerated inference.

[ ]:

# Define the prompt
prompt = "a photo of an astronaut riding a horse on mars"

# Display the result
smashed_model(prompt).images[0]

Wrap Up

Congratulations! You have successfully smashed a stable diffusion model. You can now use the pruna package to optimize any custom stable diffusion model. The only parts that you should modify are step 1 and step 4 to fit your use case.