Paraformer models

Hint

Please refer to Installation to install sherpa-onnx before you read this section.

csukuangfj/sherpa-onnx-paraformer-trilingual-zh-cantonese-en (Chinese + English + Cantonese 粤语)

Note

This model does not support timestamps. It is a trilingual model, supporting both Chinese and English. (支持普通话、粤语、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/dengcunqin/speech_seaco_paraformer_large_asr_nat-zh-cantonese-en-16k-common-vocab11666-pytorch/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

# For Chinese users
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

tar xvf sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-trilingual-zh-cantonese-en$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 234M Mar 10 02:12 model.int8.onnx
-rw-r--r-- 1 1001 127 831M Mar 10 02:12 model.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 0, 13 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 1, 15 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 2, 40 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 3, 41 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 4, 37 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 5, 16 vs -1
Done!

./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
{"text": "有无人知道湾仔活道系点去㗎", "timestamps": [], "tokens":["有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去", "㗎"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav
{"text": "我喺黄大仙九龙塘联合道荡失路啊", "timestamps": [], "tokens":["我", "喺", "黄", "大", "仙", "九", "龙", "塘", "联", "合", "道", "荡", "失", "路", "啊"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演得特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav
{"text": "它这个管一下都通到有时候都通到七八层楼高然后它这管一下就可以浇到那那柱子上", "timestamps": [], "tokens":["它", "这", "个", "管", "一", "下", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "它", "这", "管", "一", "下", "就", "可", "以", "浇", "到", "那", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["yesterday", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.871 s
Real time factor (RTF): 6.871 / 42.054 = 0.163

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 0, 13 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 1, 15 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 2, 40 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 3, 41 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 4, 37 vs -1
/project/sherpa-onnx/csrc/offline-paraformer-greedy-search-decoder.cc:Decode:65 time stamp for batch: 5, 16 vs -1
Done!

./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/1.wav
{"text": "有无人知道湾仔活道系点去㗎", "timestamps": [], "tokens":["有", "无", "人", "知", "道", "湾", "仔", "活", "道", "系", "点", "去", "㗎"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/2.wav
{"text": "我喺黄大仙九龙塘联合道荡失路啊", "timestamps": [], "tokens":["我", "喺", "黄", "大", "仙", "九", "龙", "塘", "联", "合", "道", "荡", "失", "路", "啊"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演得特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "得", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/5-henan.wav
{"text": "它这个管一下都通到有时候都通到七八层楼高然后它这管一下就可以浇到那那柱子上", "timestamps": [], "tokens":["它", "这", "个", "管", "一", "下", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "它", "这", "管", "一", "下", "就", "可", "以", "浇", "到", "那", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["yesterday", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.290 s
Real time factor (RTF): 6.290 / 42.054 = 0.150

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-trilingual-zh-cantonese-en/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-en-2024-03-09 (English)

Note

This model does not support timestamps. It supports only English.

This model is converted from

https://www.modelscope.cn/models/iic/speech_paraformer_asr-en-16k-vocab4199-pytorch/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-en-2024-03-09.tar.bz2

# For Chinese users
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-en-2024-03-09.tar.bz2

tar xvf sherpa-onnx-paraformer-en-2024-03-09.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-en-2024-03-09$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 220M Mar 10 02:12 model.int8.onnx
-rw-r--r-- 1 1001 127 817M Mar 10 02:12 model.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.onnx \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.onnx ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-en-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":["after", "early", "ni@@", "ght@@", "fall", "the", "yel@@", "low", "la@@", "mp@@", "s", "would", "light", "up", "here", "and", "there", "the", "squ@@", "al@@", "id", "quarter", "of", "the", "bro@@", "the@@", "ls"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav
{"text": " god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was 'on' that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven", "timestamps": [], "tokens":["god", "as", "a", "direct", "con@@", "sequence", "of", "the", "sin", "which", "man", "thus", "p@@", "uni@@", "shed", "had", "given", "her", "a", "lo@@", "vely", "child", "whose", "place", "was", "'on'", "that", "same", "di@@", "sh@@", "on@@", "ou@@", "red", "bo@@", "so@@", "m", "to", "connect", "her", "paren@@", "t", "for", "ever", "with", "the", "race", "and", "des@@", "cent", "of", "mor@@", "tal@@", "s", "and", "to", "be", "finally", "a", "bl@@", "essed", "soul", "in", "hea@@", "ven"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
{"text": " yet these thoughts affected hester prynne less with hope than apprehension", "timestamps": [], "tokens":["yet", "these", "thoughts", "aff@@", "ected", "he@@", "ster", "pr@@", "y@@", "n@@", "ne", "less", "with", "hope", "than", "ap@@", "pre@@", "hen@@", "sion"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 7.173 s
Real time factor (RTF): 7.173 / 28.165 = 0.255

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":["after", "early", "ni@@", "ght@@", "fall", "the", "yel@@", "low", "la@@", "mp@@", "s", "would", "light", "up", "here", "and", "there", "the", "squ@@", "al@@", "id", "quarter", "of", "the", "bro@@", "the@@", "ls"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/1.wav
{"text": " god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was 'on' that same dishonoured bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven", "timestamps": [], "tokens":["god", "as", "a", "direct", "con@@", "sequence", "of", "the", "sin", "which", "man", "thus", "p@@", "uni@@", "shed", "had", "given", "her", "a", "lo@@", "vely", "child", "whose", "place", "was", "'on'", "that", "same", "di@@", "sh@@", "on@@", "ou@@", "red", "bo@@", "so@@", "m", "to", "connect", "her", "paren@@", "t", "for", "ever", "with", "the", "race", "and", "des@@", "cent", "of", "mor@@", "tal@@", "s", "and", "to", "be", "finally", "a", "bl@@", "essed", "soul", "in", "hea@@", "ven"]}
----
./sherpa-onnx-paraformer-en-2024-03-09/test_wavs/8k.wav
{"text": " yet these thoughts affected hester prynne less with hope than apprehension", "timestamps": [], "tokens":["yet", "these", "thoughts", "aff@@", "ected", "he@@", "ster", "pr@@", "y@@", "n@@", "ne", "less", "with", "hope", "than", "ap@@", "pre@@", "hen@@", "sion"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.492 s
Real time factor (RTF): 5.492 / 28.165 = 0.195

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-en-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-en-2024-03-09/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-small-2024-03-09 (Chinese + English)

Note

This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/crazyant/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-onnx/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2

# For Chinese users
wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-small-2024-03-09.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-small-2024-03-09$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 79M Mar 10 00:48 model.int8.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-small-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼缸然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "缸", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 3.562 s
Real time factor (RTF): 3.562 / 47.023 = 0.076

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-small-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-small-2024-03-09/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-2024-03-09 (Chinese + English)

Note

This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/crazyant/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-onnx/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2

# For Chinese users
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-2024-03-09.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-2024-03-09$ ls -lh *.onnx

-rw-r--r-- 1 1001 127 217M Mar 10 02:22 model.int8.onnx
-rw-r--r-- 1 1001 127 785M Mar 10 02:22 model.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他的管一向就可以交到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "的", "管", "一", "向", "就", "可", "以", "交", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.829 s
Real time factor (RTF): 6.829 / 47.023 = 0.145

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.int8.onnx \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/8k.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/2-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/4-tianjin.wav
{"text": "其实他就是怕每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "怕", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2024-03-09/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他的管一向就可以交到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "的", "管", "一", "向", "就", "可", "以", "交", "到", "那", "个", "那", "柱", "子", "上"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.829 s
Real time factor (RTF): 6.829 / 47.023 = 0.145

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2024-03-09/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English)

Note

This model does not support timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch

The code for converting can be found at

https://huggingface.co/csukuangfj/paraformer-onnxruntime-python-example/tree/main

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

# For Chinese users
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-2023-03-28$ ls -lh *.onnx
-rw-r--r-- 1 kuangfangjun root 214M Apr  1 07:28 model.int8.onnx
-rw-r--r-- 1 kuangfangjun root 824M Apr  1 07:28 model.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-03-28/model.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 8.547 s
Real time factor (RTF): 8.547 / 51.236 = 0.167

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 6.439 s
Real time factor (RTF): 6.439 / 51.236 = 0.126

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx

csukuangfj/sherpa-onnx-paraformer-zh-2023-09-14 (Chinese + English))

Note

This model supports timestamps. It is a bilingual model, supporting both Chinese and English. (支持普通话、河南话、天津话、四川话等方言)

This model is converted from

https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2

# For Chinese users
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2

tar xvf sherpa-onnx-paraformer-zh-2023-09-14.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-paraformer-zh-2023-09-14$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff   232M Sep 14 13:46 model.int8.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx \
  --model-type=paraformer \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav \
  ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

/project/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx-offline --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx --model-type=paraformer ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav ./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model="./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx"), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), tokens="./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt", num_threads=2, debug=False, provider="cpu", model_type="paraformer"), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0)
Creating recognizer ...
Started
/project/sherpa-onnx/csrc/offline-stream.cc:AcceptWaveformImpl:119 Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/0.wav
{"text": "对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你", "timestamps": [0.36, 0.48, 0.62, 0.72, 0.86, 1.02, 1.32, 1.74, 1.90, 2.12, 2.20, 2.38, 2.50, 2.62, 2.74, 3.18, 3.32, 3.52, 3.62, 3.74, 3.82, 3.90, 3.98, 4.08, 4.20, 4.34, 4.56, 4.74, 5.10], "tokens":["对", "我", "做", "了", "介", "绍", "啊", "那", "么", "我", "想", "说", "的", "是", "呢", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣", "呢", "你"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/1.wav
{"text": "重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现", "timestamps": [0.16, 0.30, 0.42, 0.56, 0.72, 0.96, 1.08, 1.20, 1.30, 2.08, 2.26, 2.44, 2.58, 2.72, 2.98, 3.14, 3.26, 3.46, 3.62, 3.80, 3.88, 4.02, 4.12, 4.20, 4.36, 4.56], "tokens":["重", "点", "呢", "想", "谈", "三", "个", "问", "题", "首", "先", "呢", "就", "是", "这", "一", "轮", "全", "球", "金", "融", "动", "荡", "的", "表", "现"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/2.wav
{"text": "深入的分析这一次全球金融动荡背后的根源", "timestamps": [0.34, 0.54, 0.66, 0.80, 1.08, 1.52, 1.72, 1.90, 2.40, 2.68, 2.86, 2.96, 3.16, 3.26, 3.46, 3.54, 3.66, 3.80, 3.90], "tokens":["深", "入", "的", "分", "析", "这", "一", "次", "全", "球", "金", "融", "动", "荡", "背", "后", "的", "根", "源"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/3-sichuan.wav
{"text": "自己就是在那个在那个就是在情节里面就是感觉是演的特别好就是好像很真实一样你知道吧", "timestamps": [0.16, 0.30, 0.56, 0.72, 0.92, 1.18, 1.32, 1.88, 2.24, 2.40, 3.16, 3.28, 3.40, 3.54, 3.76, 3.88, 4.06, 4.24, 4.36, 4.56, 4.66, 4.88, 5.14, 5.30, 5.44, 5.60, 5.72, 5.84, 5.96, 6.14, 6.24, 6.38, 6.56, 6.78, 6.98, 7.08, 7.22, 7.38, 7.50, 7.62], "tokens":["自", "己", "就", "是", "在", "那", "个", "在", "那", "个", "就", "是", "在", "情", "节", "里", "面", "就", "是", "感", "觉", "是", "演", "的", "特", "别", "好", "就", "是", "好", "像", "很", "真", "实", "一", "样", "你", "知", "道", "吧"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/4-tianjin.wav
{"text": "其实他就是那每个人都可以守法就这意思法律意识太单薄了而且就是嗯也不顾及到别人的感受", "timestamps": [0.08, 0.24, 0.36, 0.56, 0.66, 0.78, 1.04, 1.14, 1.26, 1.38, 1.50, 1.58, 1.70, 1.84, 2.28, 2.38, 2.64, 2.74, 3.08, 3.28, 3.66, 3.80, 3.94, 4.14, 4.34, 4.64, 4.84, 4.94, 5.12, 5.24, 5.84, 6.10, 6.24, 6.44, 6.54, 6.66, 6.86, 7.02, 7.14, 7.24, 7.44], "tokens":["其", "实", "他", "就", "是", "那", "每", "个", "人", "都", "可", "以", "守", "法", "就", "这", "意", "思", "法", "律", "意", "识", "太", "单", "薄", "了", "而", "且", "就", "是", "嗯", "也", "不", "顾", "及", "到", "别", "人", "的", "感", "受"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/5-henan.wav
{"text": "他这个管一向都通到有时候都通到七八层楼高然后他这管一向就可以浇到那个那柱子上", "timestamps": [0.08, 0.20, 0.30, 0.42, 0.94, 1.14, 1.26, 1.46, 1.66, 2.28, 2.50, 2.62, 2.70, 2.82, 2.98, 3.14, 3.28, 3.52, 3.70, 3.86, 4.94, 5.06, 5.18, 5.30, 5.42, 5.66, 5.76, 5.94, 6.08, 6.24, 6.38, 6.60, 6.78, 6.96, 7.10, 7.30, 7.50, 7.62], "tokens":["他", "这", "个", "管", "一", "向", "都", "通", "到", "有", "时", "候", "都", "通", "到", "七", "八", "层", "楼", "高", "然", "后", "他", "这", "管", "一", "向", "就", "可", "以", "浇", "到", "那", "个", "那", "柱", "子", "上"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/6-zh-en.wav
{"text": " yesterday was 星期一 today is tuesday 明天是星期三", "timestamps": [0.36, 0.60, 0.84, 1.22, 2.24, 2.44, 2.74, 3.52, 4.06, 4.68, 5.00, 5.12, 5.76, 5.96, 6.24, 6.82, 7.02, 7.26], "tokens":["ye@@", "ster@@", "day", "was", "星", "期", "一", "today", "is", "tu@@", "es@@", "day", "明", "天", "是", "星", "期", "三"]}
----
./sherpa-onnx-paraformer-zh-2023-09-14/test_wavs/8k.wav
{"text": "甚至出现交易几乎停滞的情况", "timestamps": [0.48, 0.78, 1.04, 1.18, 1.52, 1.78, 2.06, 2.18, 2.50, 2.66, 2.88, 3.10, 3.30], "tokens":["甚", "至", "出", "现", "交", "易", "几", "乎", "停", "滞", "的", "情", "况"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 9.206 s
Real time factor (RTF): 9.206 / 51.236 = 0.180

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-paraformer-zh-2023-09-14/tokens.txt \
  --paraformer=./sherpa-onnx-paraformer-zh-2023-09-14/model.int8.onnx \
  --model-type=paraformer