Welcome

Glad to have you here! At Pruna AI, we are a model laboratory and inference provider: we create our own performance models, research and optimize models in-house, and ship them as serverless endpoints. We also host optimized models with partner inference platforms (see below) and on Pruna. We serve performance models tuned for the Pareto front of speed, cost, and quality. We also maintain an open-source optimization framework pruna for developers who want to compress and optimize their own models.

Pruna endpoints

What are performance models?

pruna hosts serverless endpoints for your models on partner platforms together with major inference providers—for example Replicate, Prodia, Runpod, Segmind, DeepInfra, Wiro, WaveSpeed, inference.sh, Eachlabs, Scenario, and Runway.

In our model laboratory, we create and host our own performance models—models we design and optimize to sit on the Pareto front of speed, cost, and quality. We optimize and offer as serverless well-known open source models for production use. Additionally, we have our own performance models like P-Image, P-Image-Edit, P-Video, and P-Video-Avatar. Get API access via the Pruna User Portal.

P-Image

A performance text-to-image model delivering AI images in under one second, combining speed, quality, prompt adherence, and reliable text rendering.

P-Image

P-Image-Edit

A state-of-the-art image editing model, offering fast, high-quality multi-image editing with excellent prompt following and text rendering.

P-Image-Edit

P-Image-Try-On

Virtual garment try-on: dress a person photo from flat-lay, product, worn, or multi-garment references — up to eleven categories per request — while preserving face, pose, and background.

P-Image-Try-On

P-Image-Upscale

↔ Drag starting image p-image-upscale

General quality improvements for finished generations and edits: sharper details, cleaner text, and a more polished final image without changing the composition.

P-Image-Upscale

P-Video

A performance video generation model delivering state-of-the-art AI video in seconds, with support for long-form generation, image references, and audio syncing.

P-Video

P-Video-Avatar

A performance avatar video model that generates talking spokesperson videos from one image using scripts or uploaded audio, with multilingual voice support.

P-Video-Avatar

P-Video-Animate

Animate a single image with motion, timing, and camera movement from a reference video—fastest cost-efficient motion transfer at 720p and 1080p.

P-Video-Animate

P-Video-Replace

Replace characters in existing video while preserving motion, camera, and scene structure—fast, affordable character swap on the p-video-replace endpoint.

P-Video-Replace

LoRA Training and Inference

P-Image LoRA and P-Image-Edit LoRA: training and inference workflows for custom styles and subjects.

LoRA Training and Inference

Sign Up

Get API access. Create an account and run inference via the Pruna User Portal.

https://dashboard.pruna.ai/login

API Reference

API Reference for the performance models.

https://docs.api.pruna.ai/guides/quickstart

All Models & Pricing

All models and pricing for our models.

https://www.pruna.ai/all-models

Why performance models?

Pruna endpoints offer significant advantages over running your own model endpoints from scratch — thanks to our integrated optimizations and cloud infrastructure partnerships. Our performance models are optimized for the Pareto front of speed, cost, and quality. They provide:

Faster: Models are hosted and optimized for speed using the latest optimization algorithms.
Cheaper: Model optimizations reduce hardware requirements and reduce costs.
Better: Good optimizations can be lossless and those are our specialty.

Tip

Check out our benchmark comparison page for a head-to-head look at latency and price compared to self-hosting and other public endpoints. See how much time and money you can save.

Pruna Open Source

pruna is a free and open-source compression framework that allows you to compress and evaluate your models. Made by developers for developers, it provides a seamless integration of state-of-the-art compression algorithms in a consistent and easy-to-use interface.

Before smashing: 4.06s inference time

After smashing: 1.44s inference time

Install Pruna

Learn how to install pruna and use serving integrations.

Install Pruna

Smash your first model

Understand how to use pruna to compress and evaluate your models.

Smash your first model

Evaluate and benchmark your models

Learn how to benchmark and evaluate your optimized models with pruna.

Evaluate quality with the Evaluation Agent

Tutorials

Get familiar with end-to-end examples for various specific modalities and use cases.

Tutorials Pruna

How does it work? First, you need to install pruna:

pip install pruna

After installing pruna, you can start smashing your models in 4 easy steps:

Load a pretrained model
Create a SmashConfig
Apply optimizations with the smash function
Run inference with the optimized model

Let’s see how it works with an example:

import torch
from diffusers import StableDiffusionPipeline
from pruna import smash, SmashConfig

# Define the model you want to smash
pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Initialize the SmashConfig
smash_config = SmashConfig(['stable_fast', 'deepcache'])

# Smash the model
smashed_model = smash(
    model=pipe,
    smash_config=smash_config,
)

# Run the model on a prompt
prompt = "a photo of an astronaut riding a horse on mars"
image = smashed_model(prompt).images[0]

Now that you’ve seen what pruna can do, it’s your turn!

Pruna Pro

pruna_pro is our premium offering that provides advanced compression algorithms and features to help you achieve better model compression results. It uses exactly the same interface as pruna, but offers additional features and algorithms.

Pruna Pro Guide

Learn how to transition to pruna_pro and access premium features.

Transition to Pruna Pro

Pro Tutorials

Learn how to use the pruna_pro features with end-to-end examples.

Tutorials Pruna Pro

Pro Features

Search for all the pro features and algorithms.

search.html?q=%28pro%29

How does it work? First, you need to install pruna_pro:

pip install pruna_pro

After installing pruna_pro, you use the exact same interface as pruna but with additional features:

Enhanced compression algorithms

from pruna_pro import smash  # instead of: from pruna import smash

smash(model, smash_config, token='<your_pruna_pro_token>') # add your token here

Now that you’ve seen what pruna_pro can do, it’s your turn!

Pruna Community

We love to organize events and workshops and there are many coming up! Find more info about our community and events in the Community section.