Welcome to Pruna’s documentation!

Glad to have you here! At Pruna AI, we create solutions that empower developers to make their ML models smaller, cheaper, faster and greener.

Our compression framework pruna is made by developers for developers. It is designed to make your life easier by providing a seamless integration of state-of-the-art compression algorithms. In just a few lines of code, pruna helps you integrate a range of diverse compression algorithms and evaluate their performance - all in a consistent and easy-to-use interface.

Before smashing: 4.06s inference time
After smashing: 1.44s inference time

How it works? Let us show you. After setting up pruna, you can start smashing your models:

import torch
from diffusers import StableDiffusionPipeline
from pruna import smash, SmashConfig

# Define the model you want to smash
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['compiler'] = 'stable_fast'
smash_config['cacher'] = 'deepcache'

# Smash the model
smashed_model = smash(
        model=pipe,
        smash_config=smash_config,
    )

# Run the model on a prompt
prompt = "a photo of an astronaut riding a horse on mars"
image = smashed_model(prompt).images[0]

Now that you’ve seen what pruna can do, it’s your turn! Ready to take your models to the next level? Get started today and see how easy it is to reduce costs and boost efficiency. Whether you want to fit models on your local machine or trim your cloud compute bill, our tools are here to support you every step of the way.