Our benchmark performance

pruna benchmarked several models to showcase the performance gains of the optimized versions.

Client Models

Note

The benchmark results below reflect performance at the time of testing and may not represent our current capabilities. For the latest inference speeds, see the public endpoints provided or reach out for a dedicated benchmark. Public models may appear under their original providers, as Pruna delivers optimization seamlessly as a white-label solution.

Wan 2.2 Image

Pruna made Wan 2.2 Image 2.4x faster than SeedDream and 1.8x faster than Flux-1.1 Pro on a single H100 GPU.

Last updated: August 2025

https://www.pruna.ai/blog/wan-22-image-juiced-image-generation

Wan 2.2 Video

Pruna made Wan 2.2 run up to 10x faster than the base model on a single H100 GPU.

Last updated: July 2025

https://www.pruna.ai/blog/wan-2-2-video-juiced-fastest-and-cheapest-video-generation

Wan 2.1 Image

Pruna AI made Wan 2.1 Image 3.6x faster than Seedream and 1.41x faster than Flux-1.1 Pro on a single H100 GPU.

Last updated: July 2025

https://www.pruna.ai/blog/wan-image-juiced-image-generation

Flux-Kontext

Pruna made Flux-Kontext run up to 4.9x faster than the base model on H100 GPU.

Last updated: June 2025

https://www.pruna.ai/blog/flux-kontext-juiced-state-of-the-art-image-editing-x5-faster

BRIA3.2

Pruna made BRIA3.2 run up to 3.6x faster than the base model on L40S GPU.

Last updated: June 2025

https://www.pruna.ai/blog/4x-faster-bria32

Llama 3.1-8B-Instruct

Pruna made Llama 3.1-8B-Instruct run up to 1.9x faster than vLLM alone on L40S GPU.

Last updated: June 2025

https://www.pruna.ai/blog/llama-juiced

Flux.1-Dev

Pruna made Flux-Dev run up to 2.8x faster than Together AI, Fireworks AI, and fal’s APIs on H100 GPUs.

Last updated: April 2025

https://www.pruna.ai/blog/flux-juiced-the-fastest-image-generation-endpoint

SmolLM2-135M-Instruct

Pruna made SmolLM2-135M-Instruct run up to 2x faster/7x smaller than the base model on CPU.

Last updated: January 2025

https://www.pruna.ai/blog/smollm2-smaller-faster

Flux-Schnell

Pruna made Flux-Schnell run up to 3x faster than the compiled base model on GPU.

Last updated: November 2024

https://www.pruna.ai/blog/each-ai-makes-flux-schnell-3x-faster-3x-cheaper

InferBench

The InferBench leaderboard compares the performance of different inference providers and their endpoints. We evaluate various providers to understand the real-world performance differences when using the same model through different services.

These inference providers offer implementations but they don’t always communicate about the optimization methods used in the background. Most endpoints have different response times and performance measures, making it crucial to benchmark based on your specific use case and requirements.