Caching for Custom Models (Pro)

Open In Colab

This tutorial demonstrates how to apply pruna’s caching algorithms to nearly any diffusion or flow matching model. In this guide, we focus on the Hunyuan 3D 2.0 model. To follow along, be sure to install hy3dgen by following the instructions provided in the official GitHub repository of Hunyuan3D-2.

1. Loading the Hunyuan 3D Model

First, load the Hunyuan 3D Model.

[ ]:
import torch
from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline

pipe = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(
    "tencent/Hunyuan3D-2mv", subfolder="hunyuan3d-dit-v2-mv", use_safetensors=True, device="cuda"
)

2. Initializing the Smash Config

Next, initialize the smash_config. Here you can configure auto, adaptive or periodic caching. Make sure to set the respective custom_model argument to true.

[ ]:
from pruna_pro import SmashConfig, smash

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config["cacher"] = "auto"
smash_config["auto_cache_mode"] = "taylor"
smash_config["auto_speed_factor"] = 0.4  # Lower is faster, but reduces quality
smash_config["auto_custom_model"] = True

3. Smashing the Model

Now, you can smash the model, which will take a few seconds. Don’t forget to replace the token by the one provided by PrunaAI.

[ ]:
smashed_pipe = smash(
    model=pipe,
    token="<your_pruna_token>",
    smash_config=smash_config,
    experimental=True,
)

4. Configure the Cache Helper

Before using the smashed pipe, set up the cache helper by providing the two key components of your pipe and how they are called:

  • Pipe: You need to supply the pipe object that manages the generation. For the Hunyuan3D pipeline, this is pipe, which is called using the __call__ method. Additionally, specify the argument that controls the number steps.

  • Backbone: We assume that the pipe uses a backbone (e.g., neural network, UNet, transformer) that does most of the work during inference. For the Hunyuan3D pipeline, the backbone is pipe.model, and it is called using the forward method.

[ ]:
smashed_pipe.cache_helper.configure(
    pipe=pipe,
    pipe_call_method="__call__",
    step_argument="num_inference_steps",
    backbone=pipe.model,
    backbone_call_method="forward",
)

5. Running the Model

After the cache helper has been configured, you can run the model with accelerated inference.

[ ]:
# you can download the images from the assets folder of https://huggingface.co/tencent/Hunyuan3D-2
mesh = smashed_pipe(
    image={"front": "front.png", "left": "left.png", "back": "back.png"},
    num_inference_steps=50,
    octree_resolution=200,
    num_chunks=20000,
    generator=torch.manual_seed(12345),
    output_type="trimesh",
)[0]

mesh.show()

Wrap Up

Congratulations! You have successfully smashed a custom model. Enjoy the speed-up!