x2 smaller Sana in action
This tutorial demonstrates how to use the pruna
package to optimize the memory footprint (going from 16 bits to 8 bits) of any diffusion model from the diffusers package.
We will use the Sana_600M_512px
model as an example, but this tutorial is working on any stable diffusion or flux model.
Have a look at the pruna_pro
tutorial “Shrink and accelerate Sana diffusion x4 smaller and x2 faster”, if you want a x2 speedup.
1. Loading the Diffusion Model
First, load your diffusion model.
[ ]:
import torch
from diffusers import SanaPipeline
# Define the model ID
model_id = "Efficient-Large-Model/Sana_600M_512px_diffusers"
# Load the pre-trained model
pipe = SanaPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
2. Initializing the Smash Config
Next, initialize the smash_config (we make use, here, of the bitsandbytes quantization algorithm).
[ ]:
from pruna import SmashConfig
# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['quantizer'] = 'hqq_diffusers'
smash_config['hqq_diffusers_weight_bits'] = 8
3. Smashing the Model
Now, smash the model. This can take up to 30 seconds.
[ ]:
from pruna import smash
# Smash the model
smashed_model = smash(
model=pipe,
smash_config=smash_config,
)
4. Running the Model
Finally, run the model to generate the image.
[ ]:
# Define the prompt
prompt = "a smiling cat dancing on a table. Miyazaki style"
# Display the result
smashed_model(prompt).images[0]
Wrap Up
Congratulations! You have successfully smashed a Sana model! You can now use the pruna
package to optimize any custom diffusion model. The only parts that you should modify are step 1 and step 4 to fit your use case.
Is it not enough? You can check this pruna_pro
tutorial “Shrink and accelerate Sana diffusion x4 smaller and x2 faster” to go one step further and also take advantage of accelerated inference!