Evaluating with CMMD using EvaluationAgent

Open In Colab

This tutorial demonstrates how to use the pruna package to evaluate a model. We will use the sdxl-turbo model and a subset of the LAION256 dataset as an example. Any execution times given below are measured on a T4 GPU.

1. Loading the Stable Diffusion Model

First, load your model.

[ ]:
from diffusers import AutoPipelineForText2Image

from pruna.engine.pruna_model import PrunaModel

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
model = PrunaModel(pipe)
pipe.set_progress_bar_config(disable=True)

2. Create Metrics

pruna allows you to pass your metrics requests in 3 ways:

  1. As a plain text request from predefined options (e.g., image_generation_quality)

  2. As a list of metric names

  3. As a list of metric instances

Options 1 and 2 uses the default settings for each metric. For full control over the metric class use option 3.

The default call_type for cmmd is single. This means that the metric will produce a score for each model. To create one comparison score between two models, set call_type to pairwise.

To learn more about single and pairwise, please refer to pruna documentation.

In this example we will use cmmd as our evaluation metric.

[ ]:
# --- Option 1: Using a simple string (default = single mode) ---
# request = "image_generation_quality"


# --- Option 2: Using a simple string (default = single mode) ---
request = ["cmmd"]

# --- Option 3: Full control using the class ---
# from pruna.evaluation.metrics import CMMD
# request = [CMMD()]  # For single mode
# request = [CMMD(call_type="pairwise")]  # For pairwise mode

3. Create an EvaluationAgent and a Task with metrics request

Pruna’s evaluation process uses a Task to define which metrics to calculate and provide the evaluation data. The EvaluationAgent then takes this Task and handles running the model inference, passing the inputs, ground truth, and predictions to each metric, and collecting the results.

[ ]:
from pruna.data.pruna_datamodule import PrunaDataModule
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task

datamodule = PrunaDataModule.from_string("LAION256")
# If you would like to limit the number of samples to evaluate, uncomment the following line
# datamodule.limit_datasets(10)
task = Task(request, datamodule)
eval_agent = EvaluationAgent(task)

3. Evaluate the first model

We can evaluate the first model even before smashing.

This is done by calling the evaluate method of the EvaluationAgent.

[ ]:
# Optional: tweak model generation parameters for benchmarking
model.inference_handler.model_args.update(
    {"num_inference_steps": 1, "guidance_scale": 0.0}
)

base_results = eval_agent.evaluate(model)
print(base_results)

4. Smash the model

Smash the model as usual.

[ ]:
import copy

from pruna import smash
from pruna.config.smash_config import SmashConfig
from pruna.engine.utils import safe_memory_cleanup

smash_config = SmashConfig()
smash_config["cacher"] = "deepcache"


copy_pipe = copy.deepcopy(pipe)
smashed_pipe = smash(copy_pipe, smash_config)
smashed_pipe.set_progress_bar_config(disable=True)
# Optional: tweak model generation parameters for benchmarking
smashed_pipe.inference_handler.model_args.update(
    {"num_inference_steps": 1, "guidance_scale": 0.0}
)
safe_memory_cleanup()

5. Evaluate the subsequent model

EvaluationAgent allows you to compare any kind of models. You can compare a baseline model with a smashed model, or two smashed models, or even two baseline models.

In this example, we now evaluate the smashed model. This is done by again calling the evaluate method of the EvaluationAgent.

[ ]:
smashed_results = eval_agent.evaluate(smashed_pipe)
print(smashed_results)