Make Stable Diffusion 3x Faster with DeepCache
This tutorial demonstrates how to use the pruna
package to reduce the latency of any U-Netābased diffusion model with DeepCache.
We use the stable-diffusion-v1-4
model as an example, although the tutorial also applies to other popular diffusion models, such as SD-XL
.
To accelerate transformer-based diffusion models, check out the pruna_pro
tutorial āMake Any Diffusion Model 3x Faster with Auto Cachingā.
1. Loading the Stable Diffusion Model
First, load your diffusion model.
[ ]:
import torch
from diffusers import StableDiffusionPipeline
# Define the model ID
model_id = "CompVis/stable-diffusion-v1-4"
# Load the pre-trained model
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
2. Initializing the Smash Config
Next, initialize the smash config. In this example, we use DeepCache.
[ ]:
from pruna import SmashConfig
# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'deepcache'
3. Smashing the Model
Now, smash the model. This only takes a few seconds.
[ ]:
from pruna import smash
# Smash the model
smashed_model = smash(
model=pipe,
smash_config=smash_config,
)
4. Running the Model
Finally, run the model to generate the image with accelerated inference.
[ ]:
# Define the prompt
prompt = "a fruit basket"
# Display the result
smashed_model(prompt).images[0]
Wrap Up
Congratulations! You have successfully smashed a Stable Diffusion model! You can now use the pruna
package to optimize any U-Netābased diffusion model.
The only parts that you should modify are step 1 and step 4 to fit your use case. Is the image quality not good enough? Or do you want to use caching with diffusion transformers such as FLUX
or Hunyuan Video
?
Then check out the pruna_pro
tutorial āMake Any Diffusion Model 3x Faster with Auto Cachingā to take your optimization one step further.