Models
We provide 8-bit quantized ONNX models for Moonshine.
You can find scripts for model quantization at
In the following, we describe how to use Moonshine models with pre-built executables in sherpa-onnx.
sherpa-onnx-moonshine-tiny-en-int8
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
tar xvf sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
rm sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-moonshine-tiny-en-int8/
total 242160
-rw-r--r-- 1 fangjun staff 1.0K Oct 26 09:42 LICENSE
-rw-r--r-- 1 fangjun staff 175B Oct 26 09:42 README.md
-rw-r--r-- 1 fangjun staff 43M Oct 26 09:42 cached_decode.int8.onnx
-rw-r--r-- 1 fangjun staff 17M Oct 26 09:42 encode.int8.onnx
-rw-r--r-- 1 fangjun staff 6.5M Oct 26 09:42 preprocess.onnx
drwxr-xr-x 6 fangjun staff 192B Oct 26 09:42 test_wavs
-rw-r--r-- 1 fangjun staff 426K Oct 26 09:42 tokens.txt
-rw-r--r-- 1 fangjun staff 51M Oct 26 09:42 uncached_decode.int8.onnx
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
--moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
--moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
--tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt \
--num-threads=1 \
./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx --moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx --moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx --tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt --num-threads=1 ./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx", encoder="./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx", uncached_decoder="./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx", cached_decoder="./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./sherpa-onnx-moonshine-tiny-en-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " After early nightfall, the yellow lamps would light up here and there the squalid quarter of the brothels.", "timestamps": [], "tokens":[" After", " early", " night", "fall", ",", " the", " yellow", " l", "amps", " would", " light", " up", " here", " and", " there", " the", " squ", "al", "id", " quarter", " of", " the", " bro", "th", "els", "."], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.213 s
Real time factor (RTF): 0.213 / 6.625 = 0.032
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
--moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
--moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
--tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
--moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
--moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
--tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt
sherpa-onnx-moonshine-base-en-int8
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-en-int8.tar.bz2
tar xvf sherpa-onnx-moonshine-base-en-int8.tar.bz2
rm sherpa-onnx-moonshine-base-en-int8.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-moonshine-base-en-int8/
total 560448
-rw-r--r-- 1 fangjun staff 1.0K Oct 26 09:42 LICENSE
-rw-r--r-- 1 fangjun staff 175B Oct 26 09:42 README.md
-rw-r--r-- 1 fangjun staff 95M Oct 26 09:42 cached_decode.int8.onnx
-rw-r--r-- 1 fangjun staff 48M Oct 26 09:42 encode.int8.onnx
-rw-r--r-- 1 fangjun staff 13M Oct 26 09:42 preprocess.onnx
drwxr-xr-x 6 fangjun staff 192B Oct 26 09:42 test_wavs
-rw-r--r-- 1 fangjun staff 426K Oct 26 09:42 tokens.txt
-rw-r--r-- 1 fangjun staff 116M Oct 26 09:42 uncached_decode.int8.onnx
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx \
--moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx \
--moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx \
--tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt \
--num-threads=1 \
./sherpa-onnx-moonshine-base-en-int8/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe
for Windows.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx --moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx --moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx --moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx --tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt --num-threads=1 ./sherpa-onnx-moonshine-base-en-int8/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx", encoder="./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx", uncached_decoder="./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx", cached_decoder="./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx"), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-en-int8/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./sherpa-onnx-moonshine-base-en-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " After early nightfall, the yellow lamps would light up here and there the squalid quarter of the brothels.", "timestamps": [], "tokens":[" After", " early", " night", "fall", ",", " the", " yellow", " l", "amps", " would", " light", " up", " here", " and", " there", " the", " squ", "al", "id", " quarter", " of", " the", " bro", "th", "els", "."], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 0.438 s
Real time factor (RTF): 0.438 / 6.625 = 0.066
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx \
--moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx \
--moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx \
--tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-preprocessor=./sherpa-onnx-moonshine-base-en-int8/preprocess.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-int8/encode.int8.onnx \
--moonshine-uncached-decoder=./sherpa-onnx-moonshine-base-en-int8/uncached_decode.int8.onnx \
--moonshine-cached-decoder=./sherpa-onnx-moonshine-base-en-int8/cached_decode.int8.onnx \
--tokens=./sherpa-onnx-moonshine-base-en-int8/tokens.txt