Welcome to Pruna

Glad to have you here! At Pruna AI, we create solutions that empower developers to make their ML models smaller, cheaper, faster, and greener.

Before smashing: 4.06s inference time
After smashing: 1.44s inference time

Our compression framework pruna is made by developers for developers. It is designed to make your life easier by providing a seamless integration of state-of-the-art compression algorithms. In just a few lines of code, pruna helps you integrate a range of diverse compression algorithms and evaluate their performance - all in a consistent and easy-to-use interface.

Pruna Open Source

pruna is a free and open-source compression framework that allows you to compress and evaluate your models.

Install Pruna

Learn how to install pruna and use serving integrations.

/setup/install
Smash your first model

Understand how to use pruna to compress and evaluate your models.

/docs_pruna/user_manual/smash
Evaluate and benchmark your models

Learn how to benchmark and evaluate your optimized models with pruna.

/docs/user_manual/evaluate
Tutorials

Get familiar with end-to-end examples for various specific modalities and use cases.

/docs_pruna/tutorials/index

Quickstart

How does it work? First, you need to install pruna:

pip install pruna

After installing pruna, you can start smashing your models in 4 easy steps:

  1. Load a pretrained model

  2. Create a SmashConfig

  3. Apply optimizations with the smash function

  4. Run inference with the optimized model

Let’s see how it works with an example:

import torch
from diffusers import StableDiffusionPipeline
from pruna import smash, SmashConfig

# Define the model you want to smash
pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['compiler'] = 'stable_fast'
smash_config['cacher'] = 'deepcache'

# Smash the model
smashed_model = smash(
    model=pipe,
    smash_config=smash_config,
)

# Run the model on a prompt
prompt = "a photo of an astronaut riding a horse on mars"
image = smashed_model(prompt).images[0]

Now that you’ve seen what pruna can do, it’s your turn!

Pruna Pro

pruna_pro is our premium offering that provides advanced compression algorithms and features to help you achieve better model compression results. It uses exactly the same interface as pruna, but offers additional features and algorithms.

Pruna Pro Guide

Learn how to transition to pruna_pro and access premium features.

/docs_pruna_pro/user_manual/pruna_pro
Optimization Agent

Use the pruna_pro Optimization Agent to find the best compression configuration for your models.

/docs_pruna_pro/user_manual/optimization_agent
Pro Tutorials

Learn how to use the pruna_pro features with end-to-end examples.

/docs_pruna_pro/tutorials/index
Pro Features

Search for all the pro features and algorithms.

/search.html?q=%28pro%29

Quickstart

How does it work? First, you need to install pruna_pro:

pip install pruna_pro

After installing pruna_pro, you use the exact same interface as pruna but with additional features:

from pruna_pro import smash  # instead of: from pruna import smash

smash(model, smash_config, token='<your_pruna_pro_token>') # add your token here

Now that you’ve seen what pruna_pro can do, it’s your turn!

Pruna Community

We love to organize events and workshops and there are many coming up! You can more info about our community and events in the Community section.