Welcome to Pruna’s documentation!

Glad to have you here! At Pruna AI, we create solutions that empower developers to make their ML models smaller, cheaper, faster and greener.

Our compression framework pruna is made by developers for developers. It is designed to make your life easier by providing a seamless integration of state-of-the-art compression algorithms. In just a few lines of code, pruna helps you integrate a range of diverse compression algorithms and evaluate their performance - all in a consistent and easy-to-use interface.

Before smashing: 4.06s inference time
After smashing: 1.44s inference time

How it works? First, you need to install pruna:

pip install pruna

After installing pruna, you can start smashing your models in 4 easy steps:

  1. Load a pretrained model

  2. Create a SmashConfig

  3. Apply optimizations with the smash function

  4. Run inference with the optimized model

Let’s see how it works with an example:

import torch
from diffusers import StableDiffusionPipeline
from pruna import smash, SmashConfig

# Define the model you want to smash
pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['compiler'] = 'stable_fast'
smash_config['cacher'] = 'deepcache'

# Smash the model
smashed_model = smash(
    model=pipe,
    smash_config=smash_config,
)

# Run the model on a prompt
prompt = "a photo of an astronaut riding a horse on mars"
image = smashed_model(prompt).images[0]

Now that you’ve seen what pruna can do, it’s your turn!