Blazingly fast Computer Vision Models

This tutorial demonstrates how to use the pruna package to optimize any custom computer vision model. We will use the vit_b_16 model as an example. Any execution times given below are measured on a T4 GPU.

1. Loading the CV Model

First, load your ViT model.

[ ]:

import torchvision

model = torchvision.models.vit_b_16(weights="ViT_B_16_Weights.DEFAULT").cuda()

2. Initializing the Smash Config

Next, initialize the smash_config.

[ ]:

from pruna import SmashConfig

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config["compilers"] = "x-fast"

3. Smashing the Model

Now, you can smash the model, which will take around 5 seconds. Don’t forget to replace the token by the one provided by PrunaAI.

[ ]:

from pruna import smash

# Smash the model
smashed_model = smash(
    model=model,
    token='<your_token>',  # replace <your-token> with your actual token or set to None if you do not have one yet
    smash_config=smash_config,
)

4. Preparing the Input

[ ]:

import numpy as np
from torchvision import transforms

# Generating a random image
image = np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8)
input_tensor = transforms.ToTensor()(image).unsqueeze(0).cuda()

5. Running the Model

After the model has been compiled, we run inference for a few iterations as warm-up. This will take around 8 seconds.

[ ]:

# run some warm-up iterations
for _ in range(5):
  smashed_model(input_tensor)

Finally, run the model to transcribe the audio file with accelerated inference.

[ ]:

# Display the result
smashed_model(input_tensor)

Wrap Up

Congratulations! You have successfully smashed a CV model. You can now use the pruna package to optimize any custom CV model. The only parts that you should modify are step 1, 4 and 5 to fit your use case