Conv-Emformer-transducer-based Models
Hint
Please refer to Installation to install sherpa-ncnn before you read this section.
marcoyang/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09 (English)
This model is a small version of conv-emformer-transducer trained in icefall.
It only has 8.8 million parameters
and can be deployed on embedded devices
for real-time speech recognition. You can find the models in fp16
and int8
format
at https://huggingface.co/marcoyang/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.
This model is trained using LibriSpeech and thus it supports only English.
In the following, we show you how to download it and deploy it with sherpa-ncnn on an embedded device, whose CPU is RV1126 (Quad core ARM Cortex-A7)
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2
tar xvf sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2
Note
Please refer to Embedded Linux (arm) for how to compile sherpa-ncnn for a 32-bit ARM platform. In the following, we test the pre-trained model on an embedded device, whose CPU is RV1126 (Quad core ARM Cortex-A7).
Decode a single wave file with ./build/bin/sherpa-ncnn
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav \
The outputs are shown below. The CPU used for decoding is RV1126 (Quad core ARM Cortex-A7).
Note
The default option uses 4 threads and greedy_search
for decoding.
Note
Please use ./build/bin/Release/sherpa-ncnn.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
Decode a single wave file with ./build/bin/sherpa-ncnn (with int8 quantization)
Note
We also support int8 quantization to compresss the model and speed up inference. Currently, only encoder and joiner are quantized.
To decode the int8-quantized model, use the following command:
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav \
The outputs are shown below. The CPU used for decoding is RV1126 (Quad core ARM Cortex-A7).
Compared to the original model in fp16
format,
the decoding speed is significantly improved. The decoding time is changed from
3.26 s
to 2.44 s
.
Note
When the model’s weights are quantized to float16
, it is converted
to float32
during computation.
When the model’s weights are quantized to int8
, it is using int8
during computation.
Hint
Even if we use only 1 thread for the int8
model, the resulting real
time factor (RTF) is still less than 1
.
csukuangfj/sherpa-ncnn-conv-emformer-transducer-2022-12-06 (Chinese + English)
This model is converted from https://huggingface.co/ptrnull/icefall-asr-conv-emformer-transducer-stateless2-zh, which supports both Chinese and English.
Hint
If you want to train your own model that is able to support both Chinese and English, please refer to our training code:
You can also try the pre-trained models in your browser without installing anything by visiting:
In the following, we describe how to download and use it with sherpa-ncnn.
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-2022-12-06.tar.bz2
tar xvf sherpa-ncnn-conv-emformer-transducer-2022-12-06.tar.bz2
Decode a single wave file with ./build/bin/sherpa-ncnn
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/test_wavs/0.wav \
Note
Please use ./build/bin/Release/sherpa-ncnn.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
Real-time speech recognition from a microphone with build/bin/sherpa-ncnn-microphone
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn-microphone \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace-pnnx.ncnn.bin
Note
Please use ./build/bin/Release/sherpa-ncnn-microphone.exe
for Windows.
It will print something like below:
Number of threads: 4
num devices: 4
Use default device: 2
Name: MacBook Pro Microphone
Max input channels: 1
Started
Speak and it will show you the recognition result in real-time.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
csukuangfj/sherpa-ncnn-conv-emformer-transducer-2022-12-08 (Chinese)
Hint
This is a very small model that can be run in real-time on embedded sytems.
This model is trained using WenetSpeech dataset and it supports only Chinese.
In the following, we describe how to download and use it with sherpa-ncnn.
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-2022-12-08.tar.bz2
tar xvf sherpa-ncnn-conv-emformer-transducer-2022-12-08.tar.bz2
Decode a single wave file with ./build/bin/sherpa-ncnn
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-ncnn.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
Real-time speech recognition from a microphone with build/bin/sherpa-ncnn-microphone
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn-microphone \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-08/joiner_jit_trace-pnnx.ncnn.bin
Note
Please use ./build/bin/Release/sherpa-ncnn-microphone.exe
for Windows.
It will print something like below:
Number of threads: 4
num devices: 4
Use default device: 2
Name: MacBook Pro Microphone
Max input channels: 1
Started
Speak and it will show you the recognition result in real-time.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
csukuangfj/sherpa-ncnn-conv-emformer-transducer-2022-12-04 (English)
This model is trained using GigaSpeech and LibriSpeech. It supports only English.
In the following, we describe how to download and use it with sherpa-ncnn.
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-2022-12-04.tar.bz2
tar xvf sherpa-ncnn-conv-emformer-transducer-2022-12-04.tar.bz2
Decode a single wave file with ./build/bin/sherpa-ncnn
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/test_wavs/1089-134686-0001.wav
Note
Please use ./build/bin/Release/sherpa-ncnn.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
Real-time speech recognition from a microphone with build/bin/sherpa-ncnn-microphone
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn-microphone \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-2022-12-04/joiner_jit_trace-pnnx.ncnn.param
Note
Please use ./build/bin/Release/sherpa-ncnn-microphone.exe
for Windows.
It will print something like below:
Number of threads: 4
num devices: 4
Use default device: 2
Name: MacBook Pro Microphone
Max input channels: 1
Started
Speak and it will show you the recognition result in real-time.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.