Turbocharge Text-to-Video Generation (Pro)

Open In Colab

This tutorial demonstrates how to use the pruna package to optimize a video generation pipeline. We will use the Wan2.1-T2V model as an example.

1. Loading the Wan Text-to-Video Model

First, load your video generation model.

[ ]:
import torch
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.utils import export_to_video

model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16).to("cuda")

2. Initializing the Smash Config

Next, initialize the smash_config.

[ ]:
from pruna_pro import SmashConfig, smash

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'auto'
smash_config['auto_cache_mode'] = 'taylor'
smash_config['auto_speed_factor'] = 0.42 # This will lead to a 2.5x speedup, lower is faster but more quality loss
smash_config['compiler'] = "torch_compile"

3. Smashing the Model

Now, you can smash the model, which will take a one minute. Don’t forget to replace the token by the one provided by PrunaAI.

[ ]:
# Smash the pipe
smashed_pipe = smash(
    model=pipe,
    token="<your_pruna_token>",
    smash_config=smash_config,
)

4. Running the Model

Finally, run the model to generate the video with accelerated inference.

[ ]:
# warm up: the first run will be slow
output = smashed_pipe(
    prompt="A cat walks on the grass, realistic",
    negative_prompt= "Bright tones, overexposed, static, blurred details.",
    height=480,
    width=480,
    num_frames=81,
    guidance_scale=5.0,
    num_inference_steps=50,
)
[ ]:
output = smashed_pipe(
    prompt="A cat walks on the grass, realistic",
    negative_prompt= "Bright tones, overexposed, static, blurred details.",
    height=480,
    width=480,
    num_frames=81,
    guidance_scale=5.0,
    num_inference_steps=50,
).frames[0]
export_to_video(output, "smashed_output.mp4", fps=15)

Wrap Up

Congratulations! You have successfully smashed a text-to-video model. You can now use the pruna package to optimize any custom video generation model. The only parts that you should modify are step 1 and step 4 to fit your use case.