Conformer-transducer-based Models

Hint

Please refer to Installation to install sherpa-onnx before you read this section.

csukuangfj/sherpa-onnx-conformer-zh-stateless2-2023-05-23 (Chinese)

This model is converted from

https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2

which supports only Chinese as it is trained on the WenetSpeech corpus.

You can find the training code at

https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-conformer-zh-stateless2-2023-05-23.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-conformer-zh-stateless2-2023-05-23.tar.bz2

tar xvf sherpa-onnx-conformer-zh-stateless2-2023-05-23.tar.bz2
rm sherpa-onnx-conformer-zh-stateless2-2023-05-23.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-conformer-zh-stateless2-2023-05-23 fangjun$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff    11M May 23 15:29 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff    12M May 23 15:29 decoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   122M May 23 15:30 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   315M May 23 15:31 encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   2.7M May 23 15:29 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff    11M May 23 15:29 joiner-epoch-99-avg-1.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt \
  --encoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/0.wav \
  ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/1.wav \
  ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/2.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt --encoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.onnx --decoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx --joiner=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.onnx ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/0.wav ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/1.wav ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/2.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.onnx", decoder_filename="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx", joiner_filename="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), tokens="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt", num_threads=2, debug=False, provider="cpu"), lm_config=OfflineLMConfig(model="", scale=0.5), decoding_method="greedy_search", max_active_paths=4)
Creating recognizer ...
Started
Done!

./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/0.wav
{"text":"对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢","timestamps":"[0.00, 0.12, 0.44, 0.64, 0.84, 1.04, 1.64, 1.72, 1.88, 2.08, 2.28, 2.44, 2.56, 2.76, 3.08, 3.20, 3.32, 3.48, 3.64, 3.76, 3.88, 4.00, 4.16, 4.24, 4.44, 4.60, 4.84]","tokens":["对","我","做","了","介","绍","那","么","我","想","说","的","是","呢","大","家","如","果","对","我","的","研","究","感","兴","趣","呢"]}
----
./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/1.wav
{"text":"重点想谈三个问题首先呢就是这一轮全球金融动荡的表现","timestamps":"[0.00, 0.12, 0.48, 0.64, 0.88, 1.08, 1.28, 1.48, 1.80, 2.12, 2.40, 2.56, 2.68, 2.88, 3.04, 3.16, 3.36, 3.56, 3.68, 3.84, 4.00, 4.16, 4.32, 4.56, 4.76]","tokens":["重","点","想","谈","三","个","问","题","首","先","呢","就","是","这","一","轮","全","球","金","融","动","荡","的","表","现"]}
----
./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/2.wav
{"text":"深入地分析这一次全球金融动荡背后的根源","timestamps":"[0.00, 0.16, 0.60, 0.88, 1.08, 1.36, 1.64, 1.84, 2.24, 2.52, 2.72, 2.92, 3.08, 3.24, 3.40, 3.56, 3.72, 3.88, 4.12]","tokens":["深","入","地","分","析","这","一","次","全","球","金","融","动","荡","背","后","的","根","源"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.596 s
Real time factor (RTF): 0.596 / 15.289 = 0.039

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt \
  --encoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/0.wav \
  ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/1.wav \
  ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/2.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

We did not use int8 for the decoder model above.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt --encoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.int8.onnx --decoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx --joiner=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.int8.onnx ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/0.wav ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/1.wav ./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/2.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.int8.onnx", decoder_filename="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx", joiner_filename="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), tokens="./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt", num_threads=2, debug=False, provider="cpu"), lm_config=OfflineLMConfig(model="", scale=0.5), decoding_method="greedy_search", max_active_paths=4)
Creating recognizer ...
Started
Done!

./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/0.wav
{"text":"对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢","timestamps":"[0.00, 0.12, 0.44, 0.64, 0.84, 1.08, 1.64, 1.72, 1.88, 2.08, 2.28, 2.44, 2.56, 2.76, 3.08, 3.20, 3.32, 3.48, 3.64, 3.76, 3.88, 4.00, 4.16, 4.24, 4.48, 4.60, 4.84]","tokens":["对","我","做","了","介","绍","那","么","我","想","说","的","是","呢","大","家","如","果","对","我","的","研","究","感","兴","趣","呢"]}
----
./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/1.wav
{"text":"重点想谈三个问题首先呢就是这一轮全球金融动荡的表现","timestamps":"[0.00, 0.08, 0.48, 0.64, 0.88, 1.08, 1.28, 1.48, 1.80, 2.08, 2.40, 2.56, 2.68, 2.88, 3.04, 3.16, 3.36, 3.56, 3.68, 3.84, 4.00, 4.16, 4.32, 4.56, 4.76]","tokens":["重","点","想","谈","三","个","问","题","首","先","呢","就","是","这","一","轮","全","球","金","融","动","荡","的","表","现"]}
----
./sherpa-onnx-conformer-zh-stateless2-2023-05-23/test_wavs/2.wav
{"text":"深入地分析这一次全球金融动荡背后的根源","timestamps":"[0.00, 0.12, 0.56, 0.84, 1.08, 1.40, 1.64, 1.84, 2.24, 2.52, 2.72, 2.92, 3.08, 3.24, 3.40, 3.56, 3.72, 3.88, 4.12]","tokens":["深","入","地","分","析","这","一","次","全","球","金","融","动","荡","背","后","的","根","源"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.439 s
Real time factor (RTF): 0.439 / 15.289 = 0.029

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/tokens.txt \
  --encoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-zh-stateless2-2023-05-23/joiner-epoch-99-avg-1.onnx

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

csukuangfj/sherpa-onnx-conformer-zh-2023-05-23 (Chinese)

This model is converted from

https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless5_offline

which supports only Chinese as it is trained on the WenetSpeech corpus.

You can find the training code at

https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-conformer-zh-2023-05-23.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-conformer-zh-2023-05-23.tar.bz2

tar xvf sherpa-onnx-conformer-zh-2023-05-23.tar.bz2
rm sherpa-onnx-conformer-zh-2023-05-23.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-conformer-zh-2023-05-23 fangjun$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff    11M May 23 13:45 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff    12M May 23 13:45 decoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   129M May 23 13:47 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   345M May 23 13:48 encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   2.7M May 23 13:45 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff    11M May 23 13:45 joiner-epoch-99-avg-1.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt \
  --encoder=./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/0.wav \
  ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/1.wav \
  ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/2.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --tokens=./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt --encoder=./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.onnx --decoder=./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx --joiner=./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.onnx ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/0.wav ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/1.wav ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/2.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.onnx", decoder_filename="./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx", joiner_filename="./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), tokens="./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt", num_threads=2, debug=False, provider="cpu"), lm_config=OfflineLMConfig(model="", scale=0.5), decoding_method="greedy_search", max_active_paths=4)
Creating recognizer ...
Started
Done!

./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/0.wav
{"text":"对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢","timestamps":"[0.00, 0.12, 0.52, 0.64, 0.84, 1.04, 1.68, 1.80, 1.92, 2.12, 2.32, 2.48, 2.64, 2.76, 3.08, 3.20, 3.44, 3.52, 3.64, 3.76, 3.88, 4.00, 4.16, 4.32, 4.48, 4.64, 4.84]","tokens":["对","我","做","了","介","绍","那","么","我","想","说","的","是","呢","大","家","如","果","对","我","的","研","究","感","兴","趣","呢"]}
----
./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/1.wav
{"text":"重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现","timestamps":"[0.04, 0.16, 0.36, 0.48, 0.68, 0.92, 1.08, 1.24, 1.44, 1.84, 2.08, 2.36, 2.52, 2.68, 2.88, 3.04, 3.16, 3.40, 3.56, 3.72, 3.84, 4.04, 4.16, 4.32, 4.56, 4.76]","tokens":["重","点","呢","想","谈","三","个","问","题","首","先","呢","就","是","这","一","轮","全","球","金","融","动","荡","的","表","现"]}
----
./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/2.wav
{"text":"深度地分析这一次全球金融动荡背后的根源","timestamps":"[0.00, 0.12, 0.60, 0.84, 1.04, 1.44, 1.68, 1.84, 2.28, 2.52, 2.80, 2.92, 3.08, 3.24, 3.40, 3.60, 3.72, 3.84, 4.12]","tokens":["深","度","地","分","析","这","一","次","全","球","金","融","动","荡","背","后","的","根","源"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.706 s
Real time factor (RTF): 0.706 / 15.289 = 0.046

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt \
  --encoder=./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/0.wav \
  ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/1.wav \
  ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/2.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

Caution

We did not use int8 for the decoder model above.

You should see the following output:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-offline --decoding-method=greedy_search --tokens=./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt --encoder=./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.int8.onnx --decoder=./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx --joiner=./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.int8.onnx ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/0.wav ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/1.wav ./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/2.wav 

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.int8.onnx", decoder_filename="./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx", joiner_filename="./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), tokens="./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt", num_threads=2, debug=False, provider="cpu"), lm_config=OfflineLMConfig(model="", scale=0.5), decoding_method="greedy_search", max_active_paths=4)
Creating recognizer ...
Started
Done!

./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/0.wav
{"text":"对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢","timestamps":"[0.00, 0.12, 0.52, 0.64, 0.84, 1.04, 1.68, 1.80, 1.92, 2.08, 2.32, 2.48, 2.64, 2.76, 3.08, 3.20, 3.44, 3.52, 3.64, 3.76, 3.88, 4.00, 4.16, 4.32, 4.48, 4.60, 4.84]","tokens":["对","我","做","了","介","绍","那","么","我","想","说","的","是","呢","大","家","如","果","对","我","的","研","究","感","兴","趣","呢"]}
----
./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/1.wav
{"text":"重点呢想谈三个问题首先呢就是这一轮全球金融动荡的表现","timestamps":"[0.04, 0.16, 0.36, 0.48, 0.68, 0.92, 1.08, 1.24, 1.44, 1.88, 2.08, 2.36, 2.52, 2.64, 2.88, 3.00, 3.16, 3.40, 3.56, 3.72, 3.84, 4.04, 4.20, 4.32, 4.56, 4.76]","tokens":["重","点","呢","想","谈","三","个","问","题","首","先","呢","就","是","这","一","轮","全","球","金","融","动","荡","的","表","现"]}
----
./sherpa-onnx-conformer-zh-2023-05-23/test_wavs/2.wav
{"text":"深度地分析这一次全球金融动荡背后的根源","timestamps":"[0.00, 0.12, 0.60, 0.84, 1.04, 1.44, 1.64, 1.84, 2.28, 2.52, 2.80, 2.92, 3.08, 3.28, 3.36, 3.60, 3.72, 3.84, 4.12]","tokens":["深","度","地","分","析","这","一","次","全","球","金","融","动","荡","背","后","的","根","源"]}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.502 s
Real time factor (RTF): 0.502 / 15.289 = 0.033

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-conformer-zh-2023-05-23/tokens.txt \
  --encoder=./sherpa-onnx-conformer-zh-2023-05-23/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-conformer-zh-2023-05-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-zh-2023-05-23/joiner-epoch-99-avg-1.onnx

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

csukuangfj/sherpa-onnx-conformer-en-2023-03-18 (English)

This model is converted from

https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13

which supports only English as it is trained on the LibriSpeech corpus.

You can find the training code at

https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-conformer-en-2023-03-18.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-conformer-en-2023-03-18.tar.bz2

tar xvf sherpa-onnx-conformer-en-2023-03-18.tar.bz2
rm sherpa-onnx-conformer-en-2023-03-18.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-en-2023-03-18$ ls -lh *.onnx
-rw-r--r-- 1 kuangfangjun root  1.3M Apr  1 07:02 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root  2.0M Apr  1 07:02 decoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 kuangfangjun root  122M Apr  1 07:02 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root  315M Apr  1 07:02 encoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 kuangfangjun root  254K Apr  1 07:02 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root 1003K Apr  1 07:02 joiner-epoch-99-avg-1.onnx

Decode wave files

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-conformer-en-2023-03-18/tokens.txt \
  --encoder=./sherpa-onnx-conformer-en-2023-03-18/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-conformer-en-2023-03-18/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-en-2023-03-18/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-conformer-en-2023-03-18/test_wavs/0.wav \
  ./sherpa-onnx-conformer-en-2023-03-18/test_wavs/1.wav \
  ./sherpa-onnx-conformer-en-2023-03-18/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-conformer-en-2023-03-18/encoder-epoch-99-avg-1.onnx", decoder_filename="./sherpa-onnx-conformer-en-2023-03-18/decoder-epoch-99-avg-1.onnx", joiner_filename="./sherpa-onnx-conformer-en-2023-03-18/joiner-epoch-99-avg-1.onnx"), paraformer=OfflineParaformerModelConfig(model=""), tokens="./sherpa-onnx-conformer-en-2023-03-18/tokens.txt", num_threads=2, debug=False), decoding_method="greedy_search")
Creating recognizer ...
2023-04-01 07:11:51.666456713 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 608379, index: 15, mask: {16, 52, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-04-01 07:11:51.666458525 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 608380, index: 16, mask: {17, 53, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
Started
Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-conformer-en-2023-03-18/test_wavs/0.wav
 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
----
./sherpa-onnx-conformer-en-2023-03-18/test_wavs/1.wav
 GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
----
./sherpa-onnx-conformer-en-2023-03-18/test_wavs/8k.wav
 YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 2.264 s
Real time factor (RTF): 2.264 / 28.165 = 0.080

int8

The following code shows how to use int8 models to decode wave files:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-offline \
  --tokens=./sherpa-onnx-conformer-en-2023-03-18/tokens.txt \
  --encoder=./sherpa-onnx-conformer-en-2023-03-18/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-conformer-en-2023-03-18/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-en-2023-03-18/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-conformer-en-2023-03-18/test_wavs/0.wav \
  ./sherpa-onnx-conformer-en-2023-03-18/test_wavs/1.wav \
  ./sherpa-onnx-conformer-en-2023-03-18/test_wavs/8k.wav

Note

Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.

You should see the following output:

OfflineRecognizerConfig(feat_config=OfflineFeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="./sherpa-onnx-conformer-en-2023-03-18/encoder-epoch-99-avg-1.int8.onnx", decoder_filename="./sherpa-onnx-conformer-en-2023-03-18/decoder-epoch-99-avg-1.onnx", joiner_filename="./sherpa-onnx-conformer-en-2023-03-18/joiner-epoch-99-avg-1.int8.onnx"), paraformer=OfflineParaformerModelConfig(model=""), tokens="./sherpa-onnx-conformer-en-2023-03-18/tokens.txt", num_threads=2, debug=False), decoding_method="greedy_search")
Creating recognizer ...
2023-04-01 07:13:26.514109433 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 608419, index: 15, mask: {16, 52, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-04-01 07:13:26.514112711 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 608420, index: 16, mask: {17, 53, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
Started
Creating a resampler:
   in_sample_rate: 8000
   output_sample_rate: 16000

Done!

./sherpa-onnx-conformer-en-2023-03-18/test_wavs/0.wav
 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
----
./sherpa-onnx-conformer-en-2023-03-18/test_wavs/1.wav
 GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
----
./sherpa-onnx-conformer-en-2023-03-18/test_wavs/8k.wav
 YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 1.370 s
Real time factor (RTF): 1.370 / 28.165 = 0.049

Speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone-offline \
  --tokens=./sherpa-onnx-conformer-en-2023-03-18/tokens.txt \
  --encoder=./sherpa-onnx-conformer-en-2023-03-18/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-conformer-en-2023-03-18/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-conformer-en-2023-03-18/joiner-epoch-99-avg-1.onnx