Models v2
sherpa-onnx-moonshine-base-ar-quantized-2026-02-27 (Arabic)
This model supports only Arabic. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-ar-moonshine_base_ar_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-ar-moonshine_base_ar_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-ar-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-ar-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-ar-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-ar-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-ar-quantized-2026-02-27
total 135M
-rw-r--r-- 1 501 staff 105M Feb 27 09:26 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 30M Feb 27 09:26 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:27 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:27 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:27 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.992 s
Started
Done!
./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " لا تسأل ماذا يمكنني أن أقول لك بل لست ماذا إن كنا كأن تقدم الفا", "timestamps": [], "durations": [], "tokens":[" ", "ل", "ا", " ", "ت", "س", "أ", "ل", " ", "م", "ا", "ذ", "ا", " ", "ي", "م", "ك", "ن", "ن", "ي", " ", "أ", "ن", " ", "أ", "ق", "و", "ل", " ", "ل", "ك", " ", "ب", "ل", " ", "ل", "س", "ت", " ", "م", "ا", "ذ", "ا", " ", "إ", "ن", " ", "ك", "ن", "ا", " ", "ك", "أ", "ن", " ", "ت", "ق", "د", "م", " ال", "ف", "ا"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.003 s
Real time factor (RTF): 1.003 / 6.546 = 0.153
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ar-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-base-en-quantized-2026-02-27 (English)
This model supports only English. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-en-moonshine_base_en_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-en-moonshine_base_en_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-en-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-en-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-en-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-en-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-en-quantized-2026-02-27
total 135M
-rw-r--r-- 1 501 staff 105M Feb 27 09:27 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 30M Feb 27 09:26 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:27 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:28 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:27 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.367 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " Ask not what your country can do for you. Ask what you can do for your country.", "timestamps": [], "durations": [], "tokens":[" Ask", " not", " what", " your", " country", " can", " do", " for", " you", ".", " Ask", " what", " you", " can", " do", " for", " your", " country", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.419 s
Real time factor (RTF): 0.419 / 3.845 = 0.109
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-en-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-base-es-quantized-2026-02-27 (Spanish)
This model supports only Spanish. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-es-moonshine_base_es_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-es-moonshine_base_es_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-es-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-es-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-es-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-es-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-es-quantized-2026-02-27
total 63M
-rw-r--r-- 1 501 staff 42M Feb 27 09:26 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 20M Feb 27 09:26 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:27 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:24 test_wavs
-rw-r--r-- 1 501 staff 520K Feb 27 09:27 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.979 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 22050
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " No preguntes qué puede hacer tu país por ti. Pregunta qué puedes hacer qué por tu país.", "timestamps": [], "durations": [], "tokens":[" ", " No", " preg", "unt", "es", " qu", "é", " puede", " hacer", " tu", " país", " por", " ti", ".", " P", "reg", "unta", " qu", "é", " pu", "edes", " hacer", " qu", "é", " por", " tu", " país", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.113 s
Real time factor (RTF): 5.113 / 5.329 = 0.959
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-es-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-base-ja-quantized-2026-02-27 (Japanese)
This model supports only Japanese. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-ja-moonshine_base_ja_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-ja-moonshine_base_ja_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-ja-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-ja-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-ja-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-ja-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-ja-quantized-2026-02-27
total 135M
-rw-r--r-- 1 501 staff 105M Feb 27 09:27 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 30M Feb 27 09:27 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:27 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:25 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:27 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 2.402 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 44100
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " 国があなたのために何ができるかを問うのではなく、あなたが国のために何ができるかを問うてください。", "timestamps": [], "durations": [], "tokens":[" ", "国", "が", "あ", "な", "た", "の", "た", "め", "に", "何", "が", "で", "き", "る", "か", "を", "<0xE5>", "<0x95>", "<0x8F>", "う", "の", "で", "は", "な", "く", "、", "あ", "な", "た", "が", "国", "の", "た", "め", "に", "何", "が", "で", "き", "る", "か", "を", "<0xE5>", "<0x95>", "<0x8F>", "う", "て", "く", "だ", "さ", "い", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.968 s
Real time factor (RTF): 0.968 / 8.162 = 0.119
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-ja-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-base-uk-quantized-2026-02-27 (Ukrainian)
This model supports only Ukrainian. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-uk-moonshine_base_uk_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-uk-moonshine_base_uk_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-uk-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-uk-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-uk-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-uk-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-uk-quantized-2026-02-27
total 135M
-rw-r--r-- 1 501 staff 105M Feb 27 09:27 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 30M Feb 27 09:27 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:28 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:25 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:28 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.866 s
Started
Done!
./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " ти питай що твоя країна може зробити для тебе папитай що ти можеш зробити для своєї країни", "timestamps": [], "durations": [], "tokens":[" ти", " пи", "тай", " що", " т", "во", "я", " краї", "на", " може", " з", "ро", "би", "ти", " для", " те", "бе", " па", "пи", "тай", " що", " ти", " може", "ш", " з", "ро", "би", "ти", " для", " сво", "є", "ї", " краї", "ни"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.701 s
Real time factor (RTF): 0.701 / 6.000 = 0.117
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-uk-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-base-vi-quantized-2026-02-27 (Vietnamese)
This model supports only Vietnamese. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-vi-moonshine_base_vi_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-vi-moonshine_base_vi_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-vi-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-vi-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-vi-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-vi-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-vi-quantized-2026-02-27
total 135M
-rw-r--r-- 1 501 staff 105M Feb 27 09:27 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 30M Feb 27 09:27 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:28 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:26 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:28 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.263 s
Started
Done!
./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " đừng hỏi đất nước có thể làm gì cho bạn hãy hỏi bạn có thể làm gì cho đất nước", "timestamps": [], "durations": [], "tokens":[" ", "đ", "<0xE1>", "<0xBB>", "<0xAB>", "ng", " h", "<0xE1>", "<0xBB>", "<0x8F>", "i", " ", "đ", "ấ", "t", " n", "ư", "ớ", "c", " có", " th", "ể", " là", "m", " g", "ì", " cho", " b", "ạ", "n", " h", "ã", "y", " h", "<0xE1>", "<0xBB>", "<0x8F>", "i", " b", "ạ", "n", " có", " th", "ể", " là", "m", " g", "ì", " cho", " ", "đ", "ấ", "t", " n", "ư", "ớ", "c"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.913 s
Real time factor (RTF): 0.913 / 4.017 = 0.227
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-vi-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-base-zh-quantized-2026-02-27 (Chinese)
This model supports only Chinese. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-zh-moonshine_base_zh_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-zh-moonshine_base_zh_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-base-zh-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-base-zh-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-base-zh-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-base-zh-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-base-zh-quantized-2026-02-27
total 135M
-rw-r--r-- 1 501 staff 105M Feb 27 09:26 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 30M Feb 27 09:26 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:28 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:26 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:28 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 1.341 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " 不要问你的国家能为你做什么,而要问你能为你的国家做什么。", "timestamps": [], "durations": [], "tokens":[" ", "不", "要", "问", "你", "的", "国", "家", "能", "为", "你", "<0xE5>", "<0x81>", "<0x9A>", "<0xE4>", "<0xBB>", "<0x80>", "么", ",", "而", "要", "问", "你", "能", "为", "你", "的", "国", "家", "<0xE5>", "<0x81>", "<0x9A>", "<0xE4>", "<0xBB>", "<0x80>", "么", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.694 s
Real time factor (RTF): 0.694 / 4.759 = 0.146
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-base-zh-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27 (English)
This model supports only English. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-en-moonshine_tiny_en_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-en-moonshine_tiny_en_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27
total 43M
-rw-r--r-- 1 501 staff 30M Feb 27 09:26 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 13M Feb 27 09:26 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:28 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:26 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:28 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 0.706 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 24000
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " Ask not what your country can do for you. Ask what you can do for your country.", "timestamps": [], "durations": [], "tokens":[" Ask", " not", " what", " your", " country", " can", " do", " for", " you", ".", " Ask", " what", " you", " can", " do", " for", " your", " country", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.214 s
Real time factor (RTF): 0.214 / 3.845 = 0.056
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27 (Japanese)
This model supports only Japanese. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-ja-moonshine_tiny_ja_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-ja-moonshine_tiny_ja_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27
total 69M
-rw-r--r-- 1 501 staff 56M Feb 27 09:27 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 13M Feb 27 09:27 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:28 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:26 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:28 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 0.789 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 44100
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " 国が花とのために何ができるかを問うのではなく、あなたが国のために何ができるかを問うてください。", "timestamps": [], "durations": [], "tokens":[" ", "国", "が", "花", "と", "の", "た", "め", "に", "何", "が", "で", "き", "る", "か", "を", "<0xE5>", "<0x95>", "<0x8F>", "う", "の", "で", "は", "な", "く", "、", "あ", "な", "た", "が", "国", "の", "た", "め", "に", "何", "が", "で", "き", "る", "か", "を", "<0xE5>", "<0x95>", "<0x8F>", "う", "て", "く", "だ", "さ", "い", "。"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.942 s
Real time factor (RTF): 0.942 / 8.162 = 0.115
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ja-quantized-2026-02-27/tokens.txt
sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27 (Korean)
This model supports only Korean. In the following, we describe how to use it with sherpa-onnx.
Real-time/streaming speech recognition on Android
Pease visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html and select the file
sherpa-onnx-<version>-arm64-v8a-simulated_streaming_asr-ko-moonshine_tiny_ko_2026_02_27.apk
Note
For instance, if you choose version 1.12.27, you should use sherpa-onnx-1.12.27-arm64-v8a-simulated_streaming_asr-ko-moonshine_tiny_ko_2026_02_27.apk
The source code for the APK can be found at
Please refer to Build sherpa-onnx for Android for how to build our Android demo.
Download the model
Please use the following commands to download the model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27.tar.bz2
tar xvf sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27.tar.bz2
rm sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27.tar.bz2
ls -lh sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27
You should get the following output:
ls -lh sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27
total 69M
-rw-r--r-- 1 501 staff 56M Feb 27 09:27 decoder_model_merged.ort
-rw-r--r-- 1 501 staff 13M Feb 27 09:27 encoder_model.ort
-rw-r--r-- 1 501 staff 14K Feb 27 09:28 LICENSE
drwxr-xr-x 2 501 staff 4.0K Mar 3 07:27 test_wavs
-rw-r--r-- 1 501 staff 537K Feb 27 09:28 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
build/bin/sherpa-onnx-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt \
./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/test_wavs/0.wav
The output is given below:
/workspace/sherpa-onnx/csrc/parse-options.cc:Read:373 sherpa-onnx-offline --moonshine-encoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort --moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort --tokens=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt ./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1, enable_token_timestamps=False, enable_segment_timestamps=False), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort", uncached_decoder="", cached_decoder="", merged_decoder="./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort"), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), funasr_nano=OfflineFunASRNanoModelConfig(encoder_adaptor="", llm="", embedding="", tokenizer="", system_prompt="You are a helpful assistant.", user_prompt="语音转写:", max_new_tokens=512, temperature=1e-06, top_p=0.8, seed=42, language="", itn=True, hotwords=""), medasr=OfflineMedAsrCtcModelConfig(model=""), fire_red_asr_ctc=OfflineFireRedAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
recognizer created in 0.797 s
Started
/workspace/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:129 Creating a resampler:
in_sample_rate: 22050
output_sample_rate: 16000
Done!
./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": " ▁조국이 당신을 위해 무엇을 해줄 수 있는지 묻지 말고 당신이 조국을 위해 무엇을 할 수 있는지 물으십시오.", "timestamps": [], "durations": [], "tokens":[" ▁", "조", "국", "이", " ", "<0xEB>", "<0x8B>", "<0xB9>", "신", "을", " ", "위", "해", " ", "무", "<0xEC>", "<0x97>", "<0x87>", "을", " ", "해", "<0xEC>", "<0xA4>", "<0x84>", " ", "수", " ", "<0xEC>", "<0x9E>", "<0x88>", "는", "지", " ", "<0xEB>", "<0xAC>", "<0xBB>", "지", " ", "<0xEB>", "<0xA7>", "<0x90>", "고", " ", "<0xEB>", "<0x8B>", "<0xB9>", "신", "이", " ", "조", "국", "을", " ", "위", "해", " ", "무", "<0xEC>", "<0x97>", "<0x87>", "을", " ", "<0xED>", "<0x95>", "<0xA0>", " ", "수", " ", "<0xEC>", "<0x9E>", "<0x88>", "는", "지", " ", "<0xEB>", "<0xAC>", "<0xBC>", "<0xEC>", "<0x9C>", "<0xBC>", "<0xEC>", "<0x8B>", "<0xAD>", "시", "오", "."], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.869 s
Real time factor (RTF): 0.869 / 6.917 = 0.126
Decode long files with a VAD
The following example demonstrates how to use the model to decode a long wave file.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt \
./a-very-long-audio-file.wav
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--moonshine-encoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/encoder_model.ort \
--moonshine-merged-decoder=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/decoder_model_merged.ort \
--tokens=./sherpa-onnx-moonshine-tiny-ko-quantized-2026-02-27/tokens.txt