Turbocharge Diffusion Video Generation (Pro)

Open In Colab

This tutorial demonstrates how to use the pruna package to optimize a video generation pipeline. We will use the HunyuanVideo model as an example. Any execution times given below are measured on a A10G GPU, as this tutorial requires at least 21GB of GPU memory.

1. Loading the Stable Diffusion Video Model

First, load your video generation model.

[ ]:
import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

model_id = "tencent/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    torch_dtype=torch.bfloat16,
    revision="refs/pr/18",
    cache_dir="/efs/hf_cache",
)
pipe = HunyuanVideoPipeline.from_pretrained(
    model_id,
    transformer=transformer,
    torch_dtype=torch.float16,
    revision="refs/pr/18",
    cache_dir="/efs/hf_cache",
).to("cuda")

2. Initializing the Smash Config

Next, initialize the smash_config.

[ ]:
from pruna_pro import smash, SmashConfig

# Initialize the SmashConfig
smash_config = SmashConfig()

smash_config['cacher'] = 'adaptive'

3. Smashing the Model

Now, you can smash the model, which will take around 40 seconds. Don’t forget to replace the token by the one provided by PrunaAI.

[ ]:
# Smash the model
smashed_model = smash(
    model=pipe,
    token="<your_pruna_token>",
    smash_config=smash_config,
)

4. Running the Model

Finally, run the model to generate the video with accelerated inference.

[ ]:
output = pipe(
    prompt="A cat walks on the grass, realistic",
    height=256,
    width=256,
    num_frames=129,
    num_inference_steps=30,
    generator=torch.Generator("cuda").manual_seed(1)
).frames[0]
export_to_video(output, "hunyuan_video_smashed.mp4", fps=15)

Wrap Up

Congratulations! You have successfully smashed a stable diffusion video generation model. You can now use the pruna package to optimize any custom stable diffusion video generation model. The only parts that you should modify are step 1 and step 4 to fit your use case.