Turbocharge Diffusion Video Generation (Pro)
This tutorial demonstrates how to use the pruna
package to optimize a video generation pipeline. We will use the HunyuanVideo
model as an example. Any execution times given below are measured on a A10G GPU, as this tutorial requires at least 21GB of GPU memory.
1. Loading the Stable Diffusion Video Model
First, load your video generation model.
[ ]:
import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
model_id = "tencent/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id,
subfolder="transformer",
torch_dtype=torch.bfloat16,
revision="refs/pr/18",
cache_dir="/efs/hf_cache",
)
pipe = HunyuanVideoPipeline.from_pretrained(
model_id,
transformer=transformer,
torch_dtype=torch.float16,
revision="refs/pr/18",
cache_dir="/efs/hf_cache",
).to("cuda")
2. Initializing the Smash Config
Next, initialize the smash_config.
[ ]:
from pruna_pro import smash, SmashConfig
# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['cacher'] = 'adaptive'
3. Smashing the Model
Now, you can smash the model, which will take around 40 seconds. Don’t forget to replace the token by the one provided by PrunaAI.
[ ]:
# Smash the model
smashed_model = smash(
model=pipe,
token="<your_pruna_token>",
smash_config=smash_config,
)
4. Running the Model
Finally, run the model to generate the video with accelerated inference.
[ ]:
output = pipe(
prompt="A cat walks on the grass, realistic",
height=256,
width=256,
num_frames=129,
num_inference_steps=30,
generator=torch.Generator("cuda").manual_seed(1)
).frames[0]
export_to_video(output, "hunyuan_video_smashed.mp4", fps=15)
Wrap Up
Congratulations! You have successfully smashed a stable diffusion video generation model. You can now use the pruna
package to optimize any custom stable diffusion video generation model. The only parts that you should modify are step 1 and step 4 to fit your use case.