Make Stable Diffusion 3x Faster with DeepCache

This tutorial demonstrates how to use the pruna package to reduce the latency of any U-Net–based diffusion model with DeepCache. We use the stable-diffusion-v1-4 model as an example, although the tutorial also applies to other popular diffusion models, such as SD-XL. To accelerate transformer-based diffusion models, check out the pruna_pro tutorial “Make Any Diffusion Model 3x Faster with Auto Caching”.

1. Loading the Stable Diffusion Model

First, load your diffusion model.

[ ]:

import torch
from diffusers import StableDiffusionPipeline

# Define the model ID
model_id = "CompVis/stable-diffusion-v1-4"

# Load the pre-trained model
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

2. Initializing the Smash Config

Next, initialize the smash config. In this example, we use DeepCache.

[ ]:

from pruna import SmashConfig

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'deepcache'

3. Smashing the Model

Now, smash the model. This only takes a few seconds.

[ ]:

from pruna import smash

# Smash the model
smashed_model = smash(
    model=pipe,
    smash_config=smash_config,
)

4. Running the Model

Finally, run the model to generate the image with accelerated inference.

[ ]:

# Define the prompt
prompt = "a fruit basket"

# Display the result
smashed_model(prompt).images[0]

Wrap Up

Congratulations! You have successfully smashed a Stable Diffusion model! You can now use the pruna package to optimize any U-Net–based diffusion model. The only parts that you should modify are step 1 and step 4 to fit your use case. Is the image quality not good enough? Or do you want to use caching with diffusion transformers such as FLUX or Hunyuan Video? Then check out the pruna_pro tutorial “Make Any Diffusion Model 3x Faster with Auto Caching” to take your optimization one step further.