Tutorials Pruna
These tutorials will guide you through the process of using pruna to optimize your models. Looking for pruna_pro tutorials? Check out the Tutorials Pruna Pro page.
Compress with a hqq_diffusers quantizer and a deepcache cacher, and evaluate with throughput, total time, clip_score.
Compress with a torch_compile compiler and a flash_attn3 kernel, and evaluate with total time, latency, throughput, co2_emissions, and energy_consumed.
Compress with hqq quantization and torch_compile compilation and evaluate with elapsed_time and perplexity.
Compress with hqq quantization and torch_compile compilation and evaluate with total time, perplexity, throughput and energy_consumed.
Speed up ASR using the c_whisper compilation and whisper_s2t batching.
Compile your model with torch_compile and openvino for faster inference.
Speed up diffusers with torch_compile compilation and hqq_diffusers quantization.
Evaluate image generation quality with CMMD and EvaluationAgent.
Optimize your diffusion model with hqq_diffusers quantization in 8 bits.
Optimize Flux2 Klein 4B with FORA cacher, torchao fp8 quantizer, and torch_compile compiler; compare baseline vs optimized latency.
Optimize your diffusion model with deepcache caching.
Optimize and deploy you diffusion model with torchao and gradio.
Learn how to use the target_modules parameter to target specific modules in your model.
Optimize any computer vision model with x_fast compilation.
Recover quality using text_to_image_perp after diffusers_int8 quantization.
Distribute your Flux model across multiple GPUs with ring_attn and torch_compile.
Reduce warm-up time significantly when re-loading a torch_compile compiled model on a new machine.
Optimize latency and memory footprint of any LLM with hqq quantization and torch_compile compilation.