LSTM-transducer-based Models
Hint
Please refer to Installation to install sherpa-ncnn before you read this section.
marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13 (Bilingual, Chinese + English)
This model is a small version of lstm-transducer trained in icefall.
It only has 13.3 million parameters
and can be deployed on embedded devices
for real-time speech recognition. You can find the models in fp16
format
at https://huggingface.co/marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13.
The model is trained on a bi-lingual dataset tal_csasr
(Chinese + English), so it can be used
for both Chinese and English.
In the following, we show you how to download it and deploy it with sherpa-ncnn.
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
tar xvf sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
Note
Please refer to Embedded Linux (arm) for how to compile sherpa-ncnn for a 32-bit ARM platform.
Decode a single wave file with ./build/bin/sherpa-ncnn
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
Note
The default option uses 4 threads and greedy_search
for decoding.
Note
Please use ./build/bin/Release/sherpa-ncnn.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
csukuangfj/sherpa-ncnn-2022-09-05 (English)
This is a model trained using the GigaSpeech and the LibriSpeech dataset.
Please see https://github.com/k2-fsa/icefall/pull/558 for how the model is trained.
You can find the training code at
https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2
In the following, we describe how to download it and use it with sherpa-ncnn.
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-2022-09-05.tar.bz2
tar xvf sherpa-ncnn-2022-09-05.tar.bz2
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
for method in greedy_search modified_beam_search; do
./build/bin/sherpa-ncnn \
./sherpa-ncnn-2022-09-05/tokens.txt \
./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav \
2 \
$method
done
You should see the following output:
ModelConfig(encoder_param="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-05/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="greedy_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
ModelConfig(encoder_param="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-05/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="modified_beam_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
Note
Please use ./build/bin/Release/sherpa-ncnn.exe
for Windows.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn-microphone \
./sherpa-ncnn-2022-09-05/tokens.txt \
./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin \
2 \
greedy_search
Note
Please use ./build/bin/Release/sherpa-ncnn-microphone.exe
for Windows.
It will print something like below:
Number of threads: 4
num devices: 4
Use default device: 2
Name: MacBook Pro Microphone
Max input channels: 1
Started
Speak and it will show you the recognition result in real-time.
You can find a demo below:
csukuangfj/sherpa-ncnn-2022-09-30 (Chinese)
This is a model trained using the WenetSpeech dataset.
Please see https://github.com/k2-fsa/icefall/pull/595 for how the model is trained.
In the following, we describe how to download it and use it with sherpa-ncnn.
Please use the following commands to download it.
cd /path/to/sherpa-ncnn
wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-2022-09-30.tar.bz2
tar xvf sherpa-ncnn-2022-09-30.tar.bz2
Hint
It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.
cd /path/to/sherpa-ncnn
for method in greedy_search modified_beam_search; do
./build/bin/sherpa-ncnn \
./sherpa-ncnn-2022-09-30/tokens.txt \
./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-30/test_wavs/0.wav \
2 \
$method
done
You should see the following output:
ModelConfig(encoder_param="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-30/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="greedy_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
wav duration (s): 5.61462
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣
ModelConfig(encoder_param="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-30/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="modified_beam_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
wav duration (s): 5.61462
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
cd /path/to/sherpa-ncnn
./build/bin/sherpa-ncnn-microphone \
./sherpa-ncnn-2022-09-30/tokens.txt \
./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin \
2 \
greedy_search
Note
Please use ./build/bin/Release/sherpa-ncnn-microphone.exe
for Windows.
Caution
If you use Windows and get encoding issues, please run:
CHCP 65001
in your commandline.
You can find a demo below: