Models

We provide 8-bit quantized ONNX models for Moonshine.

You can find scripts for model quantization at

In the following, we describe how to use Moonshine models with pre-built executables in sherpa-onnx.

sherpa-onnx-moonshine-tiny-en-int8

Please use the following commands to download it.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
tar xvf sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
rm sherpa-onnx-moonshine-tiny-en-int8.tar.bz2

You should see something like below after downloading:

ls -lh sherpa-onnx-moonshine-tiny-en-int8/
total 242160
-rw-r--r--  1 fangjun  staff   1.0K Oct 26 09:42 LICENSE
-rw-r--r--  1 fangjun  staff   175B Oct 26 09:42 README.md
-rw-r--r--  1 fangjun  staff    43M Oct 26 09:42 cached_decode.int8.onnx
-rw-r--r--  1 fangjun  staff    17M Oct 26 09:42 encode.int8.onnx
-rw-r--r--  1 fangjun  staff   6.5M Oct 26 09:42 preprocess.onnx
drwxr-xr-x  6 fangjun  staff   192B Oct 26 09:42 test_wavs
-rw-r--r--  1 fangjun  staff   426K Oct 26 09:42 tokens.txt
-rw-r--r--  1 fangjun  staff    51M Oct 26 09:42 uncached_decode.int8.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt \
  --num-threads=1 \
  ./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx --moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx --moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx --tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt --num-threads=1 ./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx", encoder="./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx", uncached_decoder="./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx", cached_decoder="./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " After early nightfall, the yellow lamps would light up here and there the squalid quarter of the brothels.", "timestamps": [], "tokens":[" After", " early", " night", "fall", ",", " the", " yellow", " l", "amps", " would", " light", " up", " here", " and", " there", " the", " squ", "al", "id", " quarter", " of", " the", " bro", "th", "els", "."], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.213 s
Real time factor (RTF): 0.213 / 6.625 = 0.032

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt

sherpa-onnx-moonshine-base-en-int8

Please use the following commands to download it.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-en-int8.tar.bz2
tar xvf sherpa-onnx-moonshine-base-en-int8.tar.bz2
rm sherpa-onnx-moonshine-base-en-int8.tar.bz2

You should see something like below after downloading:

ls -lh sherpa-onnx-moonshine-base-en-int8/
total 560448
-rw-r--r--  1 fangjun  staff   1.0K Oct 26 09:42 LICENSE
-rw-r--r--  1 fangjun  staff   175B Oct 26 09:42 README.md
-rw-r--r--  1 fangjun  staff    95M Oct 26 09:42 cached_decode.int8.onnx
-rw-r--r--  1 fangjun  staff    48M Oct 26 09:42 encode.int8.onnx
-rw-r--r--  1 fangjun  staff    13M Oct 26 09:42 preprocess.onnx
drwxr-xr-x  6 fangjun  staff   192B Oct 26 09:42 test_wavs
-rw-r--r--  1 fangjun  staff   426K Oct 26 09:42 tokens.txt
-rw-r--r--  1 fangjun  staff   116M Oct 26 09:42 uncached_decode.int8.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt \
  --num-threads=1 \
  ./sherpa-onnx-moonshine-base-en-int8/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx --moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx --moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx --moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx --tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt --num-threads=1 ./sherpa-onnx-moonshine-base-en-int8/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx", encoder="./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx", uncached_decoder="./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx", cached_decoder="./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-en-int8/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./sherpa-onnx-moonshine-base-en-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " After early nightfall, the yellow lamps would light up here and there the squalid quarter of the brothels.", "timestamps": [], "tokens":[" After", " early", " night", "fall", ",", " the", " yellow", " l", "amps", " would", " light", " up", " here", " and", " there", " the", " squ", "al", "id", " quarter", " of", " the", " bro", "th", "els", "."], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.438 s
Real time factor (RTF): 0.438 / 6.625 = 0.066

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt

Speech recognition from a microphone with VAD

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

./build/bin/sherpa-onnx-vad-microphone-offline-asr \
  --silero-vad-model=./silero_vad.onnx \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt