Smashing Automatic Speech Recognition Models with x-fast
This tutorial demonstrates how to use the pruna package to optimize any custom whisper model. In this case, the outputted model is a smashed whisper model. We will use the openai/whisper-large-v3 model as an example.
Loading the ASR model
First, load your asr model.
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-large-v3"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation="eager"
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
Initializing the Smash Config
Next, initialize the smash_config.
from pruna_engine.SmashConfig import SmashConfig
# Initialize the SmashConfig
smash_config = SmashConfig()
smasher_config['task'] = 'audio_text_transcription'
smasher_config['compilers'] = ['x-fast']
# uncomment the following line to quantize the model to 16 bits
# smasher_config['quantizers'] = half
Smashing the Model
Now, smash the model.
from pruna.smash import smash
# Smash the model
smashed_model = smash(
    model=model,
    api_key='<your-api-key>',  # replace <your-api-key> with your actual API key
    smash_config=smash_config,
)
Don’t forget to replace the api_key by the one provided by PrunaAI.
Preparing the Input
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features.cuda().half()
prompt = processor.get_decoder_prompt_ids(language="english", task="transcribe")
Running the Model
Finally, run the model to transcribe the audio file. The first iteration may take a while to run because we do additional compilation in that time, but subsequent iterations will be very fast.
# Display the result
results = model.generate(input_features)
processor.batch_decode(results, skip_special_tokens=False)
Wrap Up
Congratulations! You have successfully smashed an ASR model. You can now use the pruna package to optimize any custom ASR model. The only parts that you should modify are step 1 and step 5 to fit your use case.