Evaluating with CMMD using EvaluationAgent
This tutorial demonstrates how to use the pruna
package to evaluate a model. We will use the sdxl-turbo
model and a subset of the LAION256
dataset as an example. Any execution times given below are measured on a T4 GPU.
1. Loading the Stable Diffusion Model
First, load your model.
[ ]:
from diffusers import AutoPipelineForText2Image
from pruna.engine.pruna_model import PrunaModel
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
model = PrunaModel(pipe)
pipe.set_progress_bar_config(disable=True)
2. Create Metrics
pruna
allows you to pass your metrics requests in 3 ways:
As a plain text request from predefined options (e.g.,
image_generation_quality
)As a list of metric names
As a list of metric instances
Options 1 and 2 uses the default settings for each metric. For full control over the metric class use option 3.
The default call_type
for cmmd
is single
. This means that the metric will produce a score for each model. To create one comparison score between two models, set call_type
to pairwise
.
To learn more about single
and pairwise
, please refer to pruna
documentation.
In this example we will use cmmd
as our evaluation metric.
[ ]:
# --- Option 1: Using a simple string (default = single mode) ---
# request = "image_generation_quality"
# --- Option 2: Using a simple string (default = single mode) ---
request = ["cmmd"]
# --- Option 3: Full control using the class ---
# from pruna.evaluation.metrics import CMMD
# request = [CMMD()] # For single mode
# request = [CMMD(call_type="pairwise")] # For pairwise mode
3. Create an EvaluationAgent and a Task with metrics request
Pruna’s evaluation process uses a Task to define which metrics to calculate and provide the evaluation data. The EvaluationAgent then takes this Task and handles running the model inference, passing the inputs, ground truth, and predictions to each metric, and collecting the results.
[ ]:
from pruna.data.pruna_datamodule import PrunaDataModule
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task
datamodule = PrunaDataModule.from_string("LAION256")
# If you would like to limit the number of samples to evaluate, uncomment the following line
# datamodule.limit_datasets(10)
task = Task(request, datamodule)
eval_agent = EvaluationAgent(task)
3. Evaluate the first model
We can evaluate the first model even before smashing.
This is done by calling the evaluate
method of the EvaluationAgent.
[ ]:
# Optional: tweak model generation parameters for benchmarking
model.inference_handler.model_args.update(
{"num_inference_steps": 1, "guidance_scale": 0.0}
)
base_results = eval_agent.evaluate(model)
print(base_results)
4. Smash the model
Smash the model as usual.
[ ]:
import copy
from pruna import smash
from pruna.config.smash_config import SmashConfig
from pruna.engine.utils import safe_memory_cleanup
smash_config = SmashConfig()
smash_config["cacher"] = "deepcache"
copy_pipe = copy.deepcopy(pipe)
smashed_pipe = smash(copy_pipe, smash_config)
smashed_pipe.set_progress_bar_config(disable=True)
# Optional: tweak model generation parameters for benchmarking
smashed_pipe.inference_handler.model_args.update(
{"num_inference_steps": 1, "guidance_scale": 0.0}
)
safe_memory_cleanup()
5. Evaluate the subsequent model
EvaluationAgent
allows you to compare any kind of models. You can compare a baseline model with a smashed model, or two smashed models, or even two baseline models.
In this example, we now evaluate the smashed model. This is done by again calling the evaluate
method of the EvaluationAgent.
[ ]:
smashed_results = eval_agent.evaluate(smashed_pipe)
print(smashed_results)