Deploy Pruna models

pruna offers deployment integrations with the following tools to supercharge your workflows.

Pruna is the bridge to the broader AI ecosystem, making sure your optimized models run smoothly across popular deployment and inference platforms. Whether you’re running on Docker, deploying with TritonServer, or serving with vLLM, Pruna fits right in.

Docker

Deploy Pruna in Docker containers for reproducible, GPU-accelerated environments.

docker_tutorial.rst
NVIDIA Triton Server

Production-scale AI deployments with scalable inference.

tritonserver.rst
vLLM

High-performance LLM serving with model-level optimizations.

vllm.rst
AWS AMI

Amazon machine images for running models.

ami.rst
Replicate

An inference platform for running machine learning models in production.

replicate.rst
Koyeb

A platform for running machine learning models in production.

koyeb.rst
Lightning AI LitServe

A flexible serving engine for AI models built on FastAPI to self-host and serve models.

litserve.rst