Tutorials Pruna Pro

This tutorial will guide you through some common use cases of pruna_pro. Looking for pruna tutorials? Check out the Tutorials Pruna page.

Flux in a heartbeat

Optimize your Flux model for faster inference with periodic caching and torch.compile compilation.

./flux_fast.ipynb
Shrink and accelerate SANA

Shrink and accelerate SANA with using torchao_autoquant quantization.

./sana_torchao_autoquant.ipynb
Speed up diffusion models with caching

Speed up diffusion models with auto caching.

./sd_auto_caching.ipynb
TurboCharge video generation

Speed up HunyuanVideo video generation with adaptive caching.

./video.ipynb
Accelerating inference in vLLM serving

Serve large language models with vLLM and pruna_pro optimization.

./vllm.ipynb
Caching for Custom Models

Apply caching algorithms to nearly any diffusion or flow matching model.

./custom_caching.ipynb
Accelerating vLLM with Higgs quantizer

Optimize any LLM with Higgs quantization and serve with vLLM.

./vllm_higgs.ipynb