LSTM-transducer-based Models

Hint

Please refer to Installation to install sherpa-ncnn before you read this section.

marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13 (Bilingual, Chinese + English)

This model is a small version of lstm-transducer trained in icefall.

It only has 13.3 million parameters and can be deployed on embedded devices for real-time speech recognition. You can find the models in fp16 format at https://huggingface.co/marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13.

The model is trained on a bi-lingual dataset tal_csasr (Chinese + English), so it can be used for both Chinese and English.

In the following, we show you how to download it and deploy it with sherpa-ncnn.

Please use the following commands to download it.

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
tar xvf sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2

Note

Please refer to Embedded Linux (arm) for how to compile sherpa-ncnn for a 32-bit ARM platform.

Decode a single wave file with ./build/bin/sherpa-ncnn

Hint

It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.

cd /path/to/sherpa-ncnn

./build/bin/sherpa-ncnn \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav

Note

The default option uses 4 threads and greedy_search for decoding.

Note

Please use ./build/bin/Release/sherpa-ncnn.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

csukuangfj/sherpa-ncnn-2022-09-05 (English)

This is a model trained using the GigaSpeech and the LibriSpeech dataset.

Please see https://github.com/k2-fsa/icefall/pull/558 for how the model is trained.

You can find the training code at

https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2

In the following, we describe how to download it and use it with sherpa-ncnn.

Please use the following commands to download it.

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-2022-09-05.tar.bz2
tar xvf sherpa-ncnn-2022-09-05.tar.bz2

Hint

It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.

cd /path/to/sherpa-ncnn

for method in greedy_search modified_beam_search; do
  ./build/bin/sherpa-ncnn \
    ./sherpa-ncnn-2022-09-05/tokens.txt \
    ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav \
    2 \
    $method
done

You should see the following output:

ModelConfig(encoder_param="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-05/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="greedy_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
ModelConfig(encoder_param="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-05/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="modified_beam_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav
 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

Note

Please use ./build/bin/Release/sherpa-ncnn.exe for Windows.

cd /path/to/sherpa-ncnn

./build/bin/sherpa-ncnn-microphone \
  ./sherpa-ncnn-2022-09-05/tokens.txt \
  ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin \
  2 \
  greedy_search

Note

Please use ./build/bin/Release/sherpa-ncnn-microphone.exe for Windows.

It will print something like below:

Number of threads: 4
num devices: 4
Use default device: 2
  Name: MacBook Pro Microphone
  Max input channels: 1
Started

Speak and it will show you the recognition result in real-time.

You can find a demo below:

csukuangfj/sherpa-ncnn-2022-09-30 (Chinese)

This is a model trained using the WenetSpeech dataset.

Please see https://github.com/k2-fsa/icefall/pull/595 for how the model is trained.

In the following, we describe how to download it and use it with sherpa-ncnn.

Please use the following commands to download it.

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-2022-09-30.tar.bz2
tar xvf sherpa-ncnn-2022-09-30.tar.bz2

Hint

It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.

cd /path/to/sherpa-ncnn

for method in greedy_search modified_beam_search; do
  ./build/bin/sherpa-ncnn \
    ./sherpa-ncnn-2022-09-30/tokens.txt \
    ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-30/test_wavs/0.wav \
    2 \
    $method
done

You should see the following output:

ModelConfig(encoder_param="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-30/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="greedy_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
wav duration (s): 5.61462
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣
ModelConfig(encoder_param="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-2022-09-30/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2)
DecoderConfig(method="modified_beam_search", num_active_paths=4, enable_endpoint=False, endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)))
wav filename: ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
wav duration (s): 5.61462
Started!
Done!
Recognition result for ./sherpa-ncnn-2022-09-30/test_wavs/0.wav
对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

cd /path/to/sherpa-ncnn

./build/bin/sherpa-ncnn-microphone \
  ./sherpa-ncnn-2022-09-30/tokens.txt \
  ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin \
  2 \
  greedy_search

Note

Please use ./build/bin/Release/sherpa-ncnn-microphone.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You can find a demo below: