Export Whisper to ONNX

This section describes how to export Whisper models to onnx.

Available models

Note that we have already exported Whisper models to onnx and they are available from the following huggingface repositories:

Model type	Huggingface repo	Chinese users
`tiny.en`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny.en	Here
`base.en`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-base.en	Here
`small.en`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-small.en	Here
`distil-small.en`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-distil-small.en	Here
`medium.en`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-medium.en	Here
`distil-medium.en`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-distil-medium.en	Here
`tiny`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny	Here
`base`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-base	Here
`small`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-small	Here
`medium`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-medium	Here
`large`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large	Here
`large-v1`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v1	Here
`large-v2`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v2	Here
`large-v3`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v3	Here
`turbo`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-turbo	Here
`distil-large-v2`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-distil-large-v2	Here
`distil-large-v3`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-distil-large-v3	Here
`distil-large-v3.5`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-distil-large-v3.5	Here
`medium-aishell`	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-medium-aishell	Here

Note

You can also download them from

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models

Models end with .en support only English and all other models are multilingual.

If you want to export the models by yourself or/and want to learn how the models are exported, please read below.

Export to onnx

We use

https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/whisper/export-onnx.py

to export Whisper models to onnx.

First, let us install dependencies and download the export script

pip install torch openai-whisper onnxruntime onnx librosa soundfile

git clone https://github.com/k2-fsa/sherpa-onnx/
cd sherpa-onnx/scripts/whisper
python3 ./export-onnx.py --help

It will print the following message:

usage: export-onnx.py [-h] --model
                      {tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large,large-v1,large-v2,large-v3,distil-medium.en,distil-small.en,distil-large-v2,distil-large-v3,distil-large-v3.5,medium-aishell}

optional arguments:
  -h, --help            show this help message and exit
  --model {tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large,large-v1,large-v2,large-v3,distil-medium.en,distil-small.en,distil-large-v2,distil-large-v3,distil-large-v3.5,medium-aishell}

Example 1: Export tiny.en

To export tiny.en, we can use:

python3 ./export-onnx.py --model tiny.en

It will generate the following files:

(py38) fangjuns-MacBook-Pro:whisper fangjun$ ls -lh tiny.en-*
-rw-r--r--  1 fangjun  staff   105M Aug  7 15:43 tiny.en-decoder.int8.onnx
-rw-r--r--  1 fangjun  staff   185M Aug  7 15:43 tiny.en-decoder.onnx
-rw-r--r--  1 fangjun  staff    12M Aug  7 15:43 tiny.en-encoder.int8.onnx
-rw-r--r--  1 fangjun  staff    36M Aug  7 15:43 tiny.en-encoder.onnx
-rw-r--r--  1 fangjun  staff   816K Aug  7 15:43 tiny.en-tokens.txt

tiny.en-encoder.onnx is the encoder model and tiny.en-decoder.onnx is the decoder model.

tiny.en-encoder.int8.onnx is the quantized encoder model and tiny.en-decoder.onnx is the quantized decoder model.

tiny.en-tokens.txt contains the token table, which maps an integer to a token and vice versa.

To check whether the exported model works correctly, we can use: https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/whisper/test.py

We use https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny.en/resolve/main/test_wavs/0.wav as the test wave.

pip install kaldi-native-fbank
wget https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny.en/resolve/main/test_wavs/0.wav

python3 ./test.py \
  --encoder ./tiny.en-encoder.onnx \
  --decoder ./tiny.en-decoder.onnx \
  --tokens ./tiny.en-tokens.txt \
  ./0.wav

To test int8 quantized models, we can use:

python3 ./test.py \
  --encoder ./tiny.en-encoder.int8.onnx \
  --decoder ./tiny.en-decoder.int8.onnx \
  --tokens ./tiny.en-tokens.txt \
  ./0.wav

Example 2: Export large-v3

To export large-v3, we can use:

python3 ./export-onnx.py --model large-v3

It will generate the following files:

(py38) fangjuns-MacBook-Pro:whisper fangjun$ ls -lh large-v3-*
-rw-r--r--  1 fangjun  staff   2.7M Jul 12 20:38 large-v3-decoder.onnx
-rw-r--r--  1 fangjun  staff   3.0G Jul 12 20:38 large-v3-decoder.weights
-rw-r--r--  1 fangjun  staff   744K Jul 12 20:35 large-v3-encoder.onnx
-rw-r--r--  1 fangjun  staff   2.8G Jul 12 20:35 large-v3-encoder.weights
-rw-r--r--  1 fangjun  staff   798K Jul 12 20:32 large-v3-tokens.txt

large-v3-encoder.onnx is the encoder model and large-v3-decoder.onnx is the decoder model.

Note that for large models, there will also be two additional weights files.

large-v3-tokens.txt contains the token table, which maps an integer to a token and vice versa.

To check whether the exported model works correctly, we can use: https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/whisper/test.py

We use https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny.en/resolve/main/test_wavs/0.wav as the test wave.

pip install kaldi-native-fbank
wget https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny.en/resolve/main/test_wavs/0.wav

python3 ./test.py \
  --encoder ./large-v3-encoder.onnx \
  --decoder ./large-v3-decoder.onnx \
  --tokens ./large-v3-tokens.txt \
  ./0.wav

Hint

We provide a colab notebook for you to try the exported large-v3 onnx model with sherpa-onnx on CPU as well as on GPU.

You will find the RTF on GPU (Tesla T4) is less than 1.