Tutorials Pruna
These tutorials will guide you through the process of using pruna to optimize your models. Looking for pruna_pro tutorials? Check out the Tutorials Pruna Pro page.
Compress with a hqq_diffusers quantizer and a deepcache cacher, and evaluate with throughput, total time, clip_score.
Compress with a torch_compile compiler and a flash_attn3 kernel, and evaluate with total time, latency, throughput, co2_emissions, and energy_consumed.
Compress with hqq quantization and torch_compile compilation and evaluate with elapsed_time and perplexity.
Compress with hqq quantization and torch_compile compilation and evaluate with total time, perplexity, throughput and energy_consumed.
Speed up ASR using the c_whisper compilation and whisper_s2t batching.
Compile your model with torch_compile and openvino for faster inference.
Speed up diffusers with torch_compile compilation and hqq_diffusers quantization.
Evaluate image generation quality with CMMD and EvaluationAgent.
Optimize your diffusion model with hqq_diffusers quantization in 8 bits.
Optimize your diffusion model with deepcache caching.
Optimize and deploy you diffusion model with torchao and gradio.
Learn how to use the target_modules parameter to target specific modules in your model.