Tutorials Pruna

These tutorials will guide you through the process of using pruna to optimize your models. Looking for pruna_pro tutorials? Check out the Tutorials Pruna Pro page.

Compress and Evaluate Image Generation Models

Compress with a hqq_diffusers quantizer and a deepcache cacher, and evaluate with throughput, total time, clip_score.

./image_generation.ipynb

Compress and Evaluate Video Generation Models

Compress with a torch_compile compiler and a flash_attn3 kernel, and evaluate with total time, latency, throughput, co2_emissions, and energy_consumed.

./video_generation.ipynb

Compress and Evaluate Large Language Models

Compress with hqq quantization and torch_compile compilation and evaluate with elapsed_time and perplexity.

./llms.ipynb

Compress and Evaluate Reasoning Large Language Models

Compress with hqq quantization and torch_compile compilation and evaluate with total time, perplexity, throughput and energy_consumed.

./reasoning_llm.ipynb

Transcribe 2 hour of audio in 2 minutes with Whisper

Speed up ASR using the c_whisper compilation and whisper_s2t batching.

./asr_tutorial.ipynb

Smash your Computer Vision model with a CPU only

Compile your model with torch_compile and openvino for faster inference.

./cv_cpu.ipynb

Speedup and Quantize any Diffusion Model

Speed up diffusers with torch_compile compilation and hqq_diffusers quantization.

./diffusion_quantization_acceleration.ipynb

Evaluating with CMMD using EvaluationAgent

Evaluate image generation quality with CMMD and EvaluationAgent.

./evaluation_agent_cmmd.ipynb

x2 smaller Sana diffusers in action

Optimize your diffusion model with hqq_diffusers quantization in 8 bits.

./sana_diffusers_int8.ipynb

Compress and Evaluate Flux2 Image Generation (Klein 4B)

Optimize Flux2 Klein 4B with FORA cacher, torchao fp8 quantizer, and torch_compile compiler; compare baseline vs optimized latency.

./flux2klein4b_tutorial.ipynb

Make Stable Diffusion 3x Faster with DeepCache

Optimize your diffusion model with deepcache caching.

./sd_deepcache.ipynb

Optimize and Deploy Sana diffusers with Pruna and Hugging Face

Optimize and deploy you diffusion model with torchao and gradio.

./deploying_sana_tutorial.ipynb

Smashing at Finer Granularity with Target Modules

Learn how to use the target_modules parameter to target specific modules in your model.

./target_modules_quanto.ipynb

Blazingly Fast Computer Vision

Optimize any computer vision model with x_fast compilation.

./computer_vision.ipynb

Recover Quality after Quantization

Recover quality using text_to_image_perp after diffusers_int8 quantization.

./recovery.ipynb

Distribute across GPUs with Ring Attention

Distribute your Flux model across multiple GPUs with ring_attn and torch_compile.

./ring_attn.ipynb

Reducing Warm-up Time for Compilation

Reduce warm-up time significantly when re-loading a torch_compile compiled model on a new machine.

./portable_compilation.ipynb

Quantize and Speedup any LLM

Optimize latency and memory footprint of any LLM with hqq quantization and torch_compile compilation.

./llm_quantization_compilation_acceleration.ipynb