Paraformer models
Hint
Please refer to Installation to install sherpa-onnx before you read this section.
csukuangfj/sherpa-onnx-streaming-paraformer-bilingual-zh-en (Chinese + English)
Note
This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)
This model is converted from
The code for converting can be found at
https://huggingface.co/csukuangfj/streaming-paraformer-zh
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
tar xvf sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
sherpa-onnx-streaming-paraformer-bilingual-zh-en fangjun$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 68M Aug 14 09:53 decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 218M Aug 14 09:55 decoder.onnx
-rw-r--r-- 1 fangjun staff 158M Aug 14 09:54 encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 607M Aug 14 09:57 encoder.onnx
Decode a single wave file
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
fp32
The following code shows how to use fp32
models to decode a wave file:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx \
--tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
--paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx \
--paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx \
./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-onnx.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav
OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx"), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=True, max_active_paths=4, context_score=1.5, decoding_method="greedy_search")
./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav
Elapsed seconds: 2.2, Real time factor (RTF): 0.21
昨天是 monday today day is 零八二 the day after tomorrow 是星期三
{"is_final":false,"segment":0,"start_time":0.0,"text":"昨天是 monday today day is 零八二 the day after tomorrow 是星期三","timestamps":"[]","tokens":["昨","天","是","mon@@","day","today","day","is","零","八","二","the","day","after","tom@@","or@@","row","是","星","期","三"]}
int8
The following code shows how to use int8
models to decode a wave file:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx \
--tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
--paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx \
--paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx \
./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-onnx.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav
OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx"), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=True, max_active_paths=4, context_score=1.5, decoding_method="greedy_search")
./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav
Elapsed seconds: 1.6, Real time factor (RTF): 0.15
昨天是 monday today day is 零八二 the day after tomorrow 是星期三
{"is_final":false,"segment":0,"start_time":0.0,"text":"昨天是 monday today day is 零八二 the day after tomorrow 是星期三","timestamps":"[]","tokens":["昨","天","是","mon@@","day","today","day","is","零","八","二","the","day","after","tom@@","or@@","row","是","星","期","三"]}
Real-time speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone \
--tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
--paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx \
--paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx
Hint
If your system is Linux (including embedded Linux), you can also use
sherpa-onnx-alsa to do real-time speech recognition with your
microphone if sherpa-onnx-microphone
does not work for you.
csukuangfj/sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en (Chinese + Cantonese + English)
Note
This model does not support timestamps. It is a trilingual model, supporting
both Chinese and English. (支持普通话、粤语
、河南话、天津话、四川话等方言)
This model is converted from
You can find the conversion code after downloading and unzipping the model.
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.tar.bz2
tar xvf sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en fangjun$ ls -lh *.onnx
-rw-r--r-- 1 fangjun staff 69M Feb 29 19:44 decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 218M Feb 29 19:44 decoder.onnx
-rw-r--r-- 1 fangjun staff 159M Feb 29 19:44 encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 607M Feb 29 19:44 encoder.onnx
Decode a single wave file
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
fp32
The following code shows how to use fp32
models to decode a wave file:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx \
--tokens=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/tokens.txt \
--paraformer-encoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.onnx \
--paraformer-decoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.onnx \
./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
Note
Please use ./build/bin/Release/sherpa-onnx.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.int8.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.int8.onnx ./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.int8.onnx", decoder="./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.int8.onnx"), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=True, max_active_paths=4, hotwords_score=1.5, hotwords_file="", decoding_method="greedy_search", blank_penalty=0)
./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
Elapsed seconds: 0.98, Real time factor (RTF): 0.16
有无人知道湾仔活道系点去
{ "text": "有无人知道湾仔活道系点去", "tokens": [ "有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去" ], "timestamps": [ ], "ys_probs": [ ], "lm_probs": [ ], "context_scores": [ ], "segment": 0, "start_time": 0.00, "is_final": false}
int8
The following code shows how to use int8
models to decode a wave file:
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx \
--tokens=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/tokens.txt \
--paraformer-encoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.int8.onnx \
--paraformer-decoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.int8.onnx \
./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
Note
Please use ./build/bin/Release/sherpa-onnx.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You should see the following output:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.int8.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.int8.onnx ./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.int8.onnx", decoder="./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.int8.onnx"), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=True, max_active_paths=4, hotwords_score=1.5, hotwords_file="", decoding_method="greedy_search", blank_penalty=0)
./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
Elapsed seconds: 0.84, Real time factor (RTF): 0.14
有无人知道湾仔活道系点去
{ "text": "有无人知道湾仔活道系点去", "tokens": [ "有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去" ], "timestamps": [ ], "ys_probs": [ ], "lm_probs": [ ], "context_scores": [ ], "segment": 0, "start_time": 0.00, "is_final": false}
Real-time speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone \
--tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
--paraformer-encoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/encoder.int8.onnx \
--paraformer-decoder=./sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en/decoder.int8.onnx
Hint
If your system is Linux (including embedded Linux), you can also use
sherpa-onnx-alsa to do real-time speech recognition with your
microphone if sherpa-onnx-microphone
does not work for you.