Make Any Diffusion Model 3x Faster with Auto Caching (Pro)

Open In Colab

This tutorial demonstrates how to use the pruna_pro package to optimize any diffusers pipeline. We use the stable-diffusion-v1-4 model as an example, although the tutorial also applies to other popular models, such as SD-XL, FLUX, and Hunyuan Video.

1. Loading the Stable Diffusion Model

First, load your model.

[ ]:
import torch
from diffusers import StableDiffusionPipeline

# Define the model ID
model_id = "CompVis/stable-diffusion-v1-4"

# Load the pre-trained model
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

2. Initializing the Smash Config

Next, initialize the smash config (we use our proprietary auto caching algorithm). The speed factor controls the latency of the model. A speed factor of 0.33 will result in a latency that is approximately 0.33x the latency of the original model.

[ ]:
from pruna_pro import SmashConfig

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'auto'
smash_config['auto_speed_factor'] = 0.5 # This will lead to a 2x speedup, lower is faster but more quality loss

3. Smashing the Model

Now, smash the model. This only takes a few seconds.

[ ]:
from pruna_pro import smash

smashed_model = smash(
    model=pipe,
    token='<your_pruna_token>',
    smash_config=smash_config,
)

4. Running the Model

Finally, run the model to generate the image with accelerated inference.

[ ]:
# Define the prompt
prompt = "a fruit basket"

# Display the result
smashed_model(prompt).images[0]

Wrap Up

Congratulations! You have successfully smashed a diffusion model! You can now use the pruna_pro package to optimize any diffusion model. Adjust step 1, 2 and 4 to fit your use case. In particular, play around with the auto_speed_factor to explore the trade-off between latency and quality and find the best configuration for your application.