Turbocharge Text-to-Video Generation (Pro)
This tutorial demonstrates how to use the pruna
package to optimize a video generation pipeline. We will use the Wan2.1-T2V
model as an example.
1. Loading the Wan Text-to-Video Model
First, load your video generation model.
[ ]:
import torch
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.utils import export_to_video
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16).to("cuda")
2. Initializing the Smash Config
Next, initialize the smash_config.
[ ]:
from pruna_pro import SmashConfig, smash
# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'auto'
smash_config['auto_cache_mode'] = 'taylor'
smash_config['auto_speed_factor'] = 0.42 # This will lead to a 2.5x speedup, lower is faster but more quality loss
smash_config['compiler'] = "torch_compile"
3. Smashing the Model
Now, you can smash the model, which will take a one minute. Don’t forget to replace the token by the one provided by PrunaAI.
[ ]:
# Smash the pipe
smashed_pipe = smash(
model=pipe,
token="<your_pruna_token>",
smash_config=smash_config,
)
4. Running the Model
Finally, run the model to generate the video with accelerated inference.
[ ]:
# warm up: the first run will be slow
output = smashed_pipe(
prompt="A cat walks on the grass, realistic",
negative_prompt= "Bright tones, overexposed, static, blurred details.",
height=480,
width=480,
num_frames=81,
guidance_scale=5.0,
num_inference_steps=50,
)
[ ]:
output = smashed_pipe(
prompt="A cat walks on the grass, realistic",
negative_prompt= "Bright tones, overexposed, static, blurred details.",
height=480,
width=480,
num_frames=81,
guidance_scale=5.0,
num_inference_steps=50,
).frames[0]
export_to_video(output, "smashed_output.mp4", fps=15)
Wrap Up
Congratulations! You have successfully smashed a text-to-video model. You can now use the pruna
package to optimize any custom video generation model. The only parts that you should modify are step 1 and step 4 to fit your use case.