Smashing Automatic Speech Recognition Models into a Pipeline

This tutorial demonstrates how to use the pruna package to optimize any custom whisper model. In this case, the outputted model is a smashed whisper model wrapped in an efficient pipeline. We will use the openai/whisper-large-v3 model as an example.

Loading the ASR model

First, load your asr model.

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline, AutoTokenizer, AutoFeatureExtractor
from datasets import load_dataset
import tokenizers


device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, use_safetensors=True, low_cpu_mem_usage=True,
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

Initializing the Smash Config

Next, initialize the smash_config.

from pruna_engine.SmashConfig import SmashConfig

# Initialize the SmashConfig
smash_config = SmashConfig()
smasher_config['compilers'] = ['ws2t', 'c_whisper']
smasher_config['processor'] = processor
# uncomment the following line to quantize the model to 8 bits
# smasher_config['weight_quantization_bits'] = 8

Smashing the Model

Now, smash the model.

from pruna.smash import smash

# Smash the model
smashed_model = smash(
    model=model,
    api_key='<your-api-key>',  # replace <your-api-key> with your actual API key
    smash_config=smash_config,
)

Don’t forget to replace the api_key by the one provided by PrunaAI.

Preparing the Input

wget https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac
audio_sample = 'sam_altman_lex_podcast_367.flac'

Running the Model

Finally, run the model to transcribe the audio file.

# Display the result
smashed_model(sample)

Wrap Up

Congratulations! You have successfully smashed an ASR model. You can now use the pruna package to optimize any custom ASR model. The only parts that you should modify are step 1 and step 5 to fit your use case.