Our benchmark performance
pruna benchmarked several models to showcase the performance gains of the optimized versions.
Client Models
Note
The benchmark results below reflect performance at the time of testing and may not represent our current capabilities. For the latest inference speeds, see the public endpoints provided or reach out for a dedicated benchmark. Public models may appear under their original providers, as Pruna delivers optimization seamlessly as a white-label solution.
Pruna made Wan 2.2 Image 2.4x faster than SeedDream and 1.8x faster than Flux-1.1 Pro on a single H100 GPU.
Last updated: August 2025
Pruna made Wan 2.2 run up to 10x faster than the base model on a single H100 GPU.
Last updated: July 2025
Pruna AI made Wan 2.1 Image 3.6x faster than Seedream and 1.41x faster than Flux-1.1 Pro on a single H100 GPU.
Last updated: July 2025
Pruna made Flux-Kontext run up to 4.9x faster than the base model on H100 GPU.
Last updated: June 2025
Pruna made BRIA3.2 run up to 3.6x faster than the base model on L40S GPU.
Last updated: June 2025
Pruna made Llama 3.1-8B-Instruct run up to 1.9x faster than vLLM alone on L40S GPU.
Last updated: June 2025
Pruna made Flux-Dev run up to 2.8x faster than Together AI, Fireworks AI, and fal’s APIs on H100 GPUs.
Last updated: April 2025
Pruna made SmolLM2-135M-Instruct run up to 2x faster/7x smaller than the base model on CPU.
Last updated: January 2025
Pruna made Flux-Schnell run up to 3x faster than the compiled base model on GPU.
Last updated: November 2024
InferBench
The InferBench leaderboard compares the performance of different inference providers and their endpoints. We evaluate various providers to understand the real-world performance differences when using the same model through different services.
These inference providers offer implementations but they don’t always communicate about the optimization methods used in the background. Most endpoints have different response times and performance measures, making it crucial to benchmark based on your specific use case and requirements.