Make Any Diffusion Model 3x Faster with Auto Caching (Pro)
This tutorial demonstrates how to use the pruna_pro
package to optimize any diffusers pipeline. We use the stable-diffusion-v1-4
model as an example, although the tutorial also applies to other popular models, such as SD-XL
, FLUX
, and Hunyuan Video
.
1. Loading the Stable Diffusion Model
First, load your model.
[ ]:
import torch
from diffusers import StableDiffusionPipeline
# Define the model ID
model_id = "CompVis/stable-diffusion-v1-4"
# Load the pre-trained model
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
2. Initializing the Smash Config
Next, initialize the smash config (we use our proprietary auto caching algorithm). The speed factor controls the latency of the model. A speed factor of 0.33 will result in a latency that is approximately 0.33x the latency of the original model.
[ ]:
from pruna_pro import SmashConfig
# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'auto'
smash_config['auto_speed_factor'] = 0.5 # This will lead to a 2x speedup, lower is faster but more quality loss
3. Smashing the Model
Now, smash the model. This only takes a few seconds.
[ ]:
from pruna_pro import smash
smashed_model = smash(
model=pipe,
token='<your_pruna_token>',
smash_config=smash_config,
)
4. Running the Model
Finally, run the model to generate the image with accelerated inference.
[ ]:
# Define the prompt
prompt = "a fruit basket"
# Display the result
smashed_model(prompt).images[0]
Wrap Up
Congratulations! You have successfully smashed a diffusion model! You can now use the pruna_pro
package to optimize any diffusion model. Adjust step 1, 2 and 4 to fit your use case. In particular, play around with the auto_speed_factor
to explore the trade-off between latency and quality and find the best configuration for your application.