Flux generation in a heartbeat, literally (Pro)

This tutorial demonstrates how to use the pruna package to optimize your Flux model for faster inference. Any execution times given below are measured on an A100 GPU.

1. Loading the Flux Model

First, load your Flux model.

[ ]:

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# pipe.enable_model_cpu_offload() # save some VRAM by offloading to CPU. Remove this if you have enough GPU memory

2. Initializing the Smash Config

Next, initialize the smash_config.

[ ]:

from pruna_pro import SmashConfig, smash

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config["cacher"] = "auto"
smash_config["auto_cache_mode"] = "taylor"
smash_config["auto_speed_factor"] = 0.4  # Lower is faster, but reduces quality
smash_config["compiler"] = "torch_compile"

3. Smashing the Model

Now, you can smash the model, which can take up to 2 minutes. Don’t forget to replace the token by the one provided by PrunaAI.

[ ]:

pipe = smash(
    model=pipe,
    token="<your_pruna_token>",
    smash_config=smash_config,
)

4. Running the Model

After the model has been compiled, we run inference for a few iterations as warm-up. You can remove torch_compile from the compiler argument in the smash_config if you prefer instant speed-up without warm-up iterations.

[ ]:

prompt = (
    "An anime illustration of Sydney Opera House sitting next to Eiffel tower, under a blue night sky of "
    "roiling energy, exploding yellow stars, and radiating swirls of blue."
)

for _ in range(5):
    pipe(prompt, num_inference_steps=50).images[0]

Run the model to generate images with accelerated inference.

[ ]:

pipe(prompt, num_inference_steps=50).images[0]

The speed factor can be adjusted on the fly to match your needs.

[ ]:

pipe.cache_helper.set_params(speed_factor=0.33)
pipe(prompt, num_inference_steps=50).images[0]

Wrap Up

Congratulations! You have successfully smashed a Flux model. Enjoy the speed-up!