Caching for Custom Models (Pro)
This tutorial demonstrates how to apply pruna’s caching algorithms to nearly any diffusion or flow matching model. In this guide, we focus on the Hunyuan 3D 2.0 model. To follow along, be sure to install hy3dgen
by following the instructions provided in the official GitHub repository of Hunyuan3D-2.
1. Loading the Hunyuan 3D Model
First, load the Hunyuan 3D Model.
[ ]:
import torch
from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
pipe = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(
"tencent/Hunyuan3D-2mv", subfolder="hunyuan3d-dit-v2-mv", use_safetensors=True, device="cuda"
)
2. Initializing the Smash Config
Next, initialize the smash_config. Here you can configure auto
, adaptive
or periodic
caching. Make sure to set the respective custom_model
argument to true.
[ ]:
from pruna_pro import SmashConfig, smash
# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config["cacher"] = "auto"
smash_config["auto_cache_mode"] = "taylor"
smash_config["auto_speed_factor"] = 0.4 # Lower is faster, but reduces quality
smash_config["auto_custom_model"] = True
3. Smashing the Model
Now, you can smash the model, which will take a few seconds. Don’t forget to replace the token by the one provided by PrunaAI.
[ ]:
smashed_pipe = smash(
model=pipe,
token="<your_pruna_token>",
smash_config=smash_config,
experimental=True,
)
4. Configure the Cache Helper
Before using the smashed pipe, set up the cache helper by providing the two key components of your pipe and how they are called:
Pipe: You need to supply the pipe object that manages the generation. For the Hunyuan3D pipeline, this is
pipe
, which is called using the__call__
method. Additionally, specify the argument that controls the number steps.Backbone: We assume that the pipe uses a backbone (e.g., neural network, UNet, transformer) that does most of the work during inference. For the Hunyuan3D pipeline, the backbone is
pipe.model
, and it is called using theforward
method.
[ ]:
smashed_pipe.cache_helper.configure(
pipe=pipe,
pipe_call_method="__call__",
step_argument="num_inference_steps",
backbone=pipe.model,
backbone_call_method="forward",
)
5. Running the Model
After the cache helper has been configured, you can run the model with accelerated inference.
[ ]:
# you can download the images from the assets folder of https://huggingface.co/tencent/Hunyuan3D-2
mesh = smashed_pipe(
image={"front": "front.png", "left": "left.png", "back": "back.png"},
num_inference_steps=50,
octree_resolution=200,
num_chunks=20000,
generator=torch.manual_seed(12345),
output_type="trimesh",
)[0]
mesh.show()
Wrap Up
Congratulations! You have successfully smashed a custom model. Enjoy the speed-up!