Smashing Automatic Speech Recognition Models into a Pipeline
This tutorial demonstrates how to use the pruna package to optimize any custom whisper model. In this case, the outputted model is a smashed whisper model wrapped in an efficient pipeline. We will use the openai/whisper-large-v3 model as an example.
Loading the ASR model
First, load your asr model.
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline, AutoTokenizer, AutoFeatureExtractor
from datasets import load_dataset
import tokenizers
device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-large-v3"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, use_safetensors=True, low_cpu_mem_usage=True,
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
Initializing the Smash Config
Next, initialize the smash_config.
from pruna_engine.SmashConfig import SmashConfig
# Initialize the SmashConfig
smash_config = SmashConfig()
smasher_config['compilers'] = ['ws2t', 'c_whisper']
smasher_config['processor'] = processor
# uncomment the following line to quantize the model to 8 bits
# smasher_config['weight_quantization_bits'] = 8
Smashing the Model
Now, smash the model.
from pruna.smash import smash
# Smash the model
smashed_model = smash(
model=model,
api_key='<your-api-key>', # replace <your-api-key> with your actual API key
smash_config=smash_config,
)
Don’t forget to replace the api_key by the one provided by PrunaAI.
Preparing the Input
wget https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac
audio_sample = 'sam_altman_lex_podcast_367.flac'
Running the Model
Finally, run the model to transcribe the audio file.
# Display the result
smashed_model(sample)
Wrap Up
Congratulations! You have successfully smashed an ASR model. You can now use the pruna package to optimize any custom ASR model. The only parts that you should modify are step 1 and step 5 to fit your use case.