

Please first refer to Installation to install sherpa before proceeding.

The server is responsible for accepting audio samples from the client, decoding it, and sending the recognition results back to the client.


cd /path/to/sherpa
./sherpa/bin/lstm_transducer_stateless/ --help

shows the usage message.

You need the following files to start the server:

  1. The neural network model

  2. The tokens.txt.

The neural network model has three parts, the encoder, the decoder, and the joiner, which are all exported using torch.jit.trace.

The above files can be obtained after training your model with

If you don’t want to train a model by yourself, you can try the pretrained model:

The following shows you how to start the server with the above pretrained model.

cd /path/to/sherpa

git lfs install
git clone

./sherpa/bin/lstm_transducer_stateless/ \
  --endpoint.rule3.min-utterance-length 1000.0 \
  --port 6007 \
  --max-batch-size 50 \
  --max-wait-ms 5 \
  --nn-pool-size 1 \
  --nn-encoder-filename ./icefall-asr-wenetspeech-lstm-transducer-stateless-2022-09-19/exp/ \
  --nn-decoder-filename ./icefall-asr-wenetspeech-lstm-transducer-stateless-2022-09-19/exp/ \
  --nn-joiner-filename ./icefall-asr-wenetspeech-lstm-transducer-stateless-2022-09-19/exp/ \
  --token-filename ./icefall-asr-wenetspeech-lstm-transducer-stateless-2022-09-19/data/lang_char/tokens.txt

That’s it!

Now you can start the Client, record your voice in real-time, and check the recognition results from the server.


The above pretrained model has been trained only for 6 epochs. We will update it in the following days.