Streaming WebSocket server and client
Hint
Please refer to Installation to install sherpa-onnx before you read this section.
Build sherpa-onnx with WebSocket support
By default, it will generate the following binaries after Installation:
sherpa-onnx fangjun$ ls -lh build/bin/*websocket*
-rwxr-xr-x 1 fangjun staff 1.1M Mar 31 22:09 build/bin/sherpa-onnx-offline-websocket-server
-rwxr-xr-x 1 fangjun staff 1.0M Mar 31 22:09 build/bin/sherpa-onnx-online-websocket-client
-rwxr-xr-x 1 fangjun staff 1.2M Mar 31 22:09 build/bin/sherpa-onnx-online-websocket-server
Please refer to Non-streaming WebSocket server and client
for the usage of sherpa-onnx-offline-websocket-server
.
View the server usage
Before starting the server, let us view the help message of sherpa-onnx-online-websocket-server
:
build/bin/sherpa-onnx-online-websocket-server
The above command will print the following help information:
Automatic speech recognition with sherpa-onnx using websocket.
Usage:
./bin/sherpa-onnx-online-websocket-server --help
./bin/sherpa-onnx-online-websocket-server \
--port=6006 \
--num-work-threads=5 \
--tokens=/path/to/tokens.txt \
--encoder=/path/to/encoder.onnx \
--decoder=/path/to/decoder.onnx \
--joiner=/path/to/joiner.onnx \
--log-file=./log.txt \
--max-batch-size=5 \
--loop-interval-ms=10
Please refer to
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
for a list of pre-trained models to download.
Options:
--max-batch-size : Max batch size for recognition. (int, default = 5)
--loop-interval-ms : It determines how often the decoder loop runs. (int, default = 10)
--max-active-paths : beam size used in modified beam search. (int, default = 4)
--decoding-method : decoding method,now support greedy_search and modified_beam_search. (string, default = "greedy_search")
--rule3-min-utterance-length : This endpointing rule3 requires utterance-length (in seconds) to be >= this value. (float, default = 20)
--rule3-min-trailing-silence : This endpointing rule3 requires duration of trailing silence in seconds) to be >= this value. (float, default = 0)
--rule3-must-contain-nonsilence : If True, for this endpointing rule3 to apply there must be nonsilence in the best-path traceback. For decoding, a non-blank token is considered as non-silence (bool, default = false)
--rule2-min-utterance-length : This endpointing rule2 requires utterance-length (in seconds) to be >= this value. (float, default = 0)
--rule1-min-trailing-silence : This endpointing rule1 requires duration of trailing silence in seconds) to be >= this value. (float, default = 2.4)
--feat-dim : Feature dimension. Must match the one expected by the model. (int, default = 80)
--rule1-must-contain-nonsilence : If True, for this endpointing rule1 to apply there must be nonsilence in the best-path traceback. For decoding, a non-blank token is considered as non-silence (bool, default = false)
--enable-endpoint : True to enable endpoint detection. False to disable it. (bool, default = true)
--num_threads : Number of threads to run the neural network (int, default = 2)
--debug : true to print model information while loading it. (bool, default = false)
--port : The port on which the server will listen. (int, default = 6006)
--num-io-threads : Thread pool size for network connections. (int, default = 1)
--rule2-must-contain-nonsilence : If True, for this endpointing rule2 to apply there must be nonsilence in the best-path traceback. For decoding, a non-blank token is considered as non-silence (bool, default = true)
--joiner : Path to joiner.onnx (string, default = "")
--tokens : Path to tokens.txt (string, default = "")
--num-work-threads : Thread pool size for for neural network computation and decoding. (int, default = 3)
--encoder : Path to encoder.onnx (string, default = "")
--sample-rate : Sampling rate of the input waveform. Note: You can have a different sample rate for the input waveform. We will do resampling inside the feature extractor (int, default = 16000)
--rule2-min-trailing-silence : This endpointing rule2 requires duration of trailing silence in seconds) to be >= this value. (float, default = 1.2)
--log-file : Path to the log file. Logs are appended to this file (string, default = "./log.txt")
--rule1-min-utterance-length : This endpointing rule1 requires utterance-length (in seconds) to be >= this value. (float, default = 0)
--decoder : Path to decoder.onnx (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
Start the server
Hint
Please refer to Pre-trained models for a list of pre-trained models.
./build/bin/sherpa-onnx-online-websocket-server \
--port=6006 \
--num-work-threads=3 \
--num-io-threads=2 \
--tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.onnx \
--log-file=./log.txt \
--max-batch-size=5 \
--loop-interval-ms=20
Caution
The arguments are in the form --key=value
.
It does not support --key value
.
It does not support --key value
.
It does not support --key value
.
Hint
In the above demo, the model files are from csukuangfj/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 (Bilingual, Chinese + English).
Note
Note that the server supports processing multiple clients in a batch in parallel.
You can use --max-batch-size
to limit the batch size.
View the usage of the client (C++)
Let us view the usage of the C++ WebSocket client:
./build/bin/sherpa-onnx-online-websocket-client
The above command will print the following help information:
[I] /Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:484:int sherpa_onnx::ParseOptions::Read(int, const char *const *) ./build/bin/sherpa-onnx-online-websocket-client
[I] /Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:525:void sherpa_onnx::ParseOptions::PrintUsage(bool) const
Automatic speech recognition with sherpa-onnx using websocket.
Usage:
./bin/sherpa-onnx-online-websocket-client --help
./bin/sherpa-onnx-online-websocket-client \
--server-ip=127.0.0.1 \
--server-port=6006 \
--samples-per-message=8000 \
--seconds-per-message=0.2 \
/path/to/foo.wav
It support only wave of with a single channel, 16kHz, 16-bit samples.
Options:
--seconds-per-message : We will simulate that each message takes this number of seconds to send. If you select a very large value, it will take a long time to send all the samples (float, default = 0.2)
--samples-per-message : Send this number of samples per message. (int, default = 8000)
--sample-rate : Sample rate of the input wave. Should be the one expected by the server (int, default = 16000)
--server-port : Port of the websocket server (int, default = 6006)
--server-ip : IP address of the websocket server (string, default = "127.0.0.1")
Standard options:
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--config : Configuration file to read (this option may be repeated) (string, default = "")
Caution
We only support using IP address for --server-ip
.
For instance, please don’t use --server-ip=localhost
, use --server-ip=127.0.0.1
instead.
For instance, please don’t use --server-ip=localhost
, use --server-ip=127.0.0.1
instead.
For instance, please don’t use --server-ip=localhost
, use --server-ip=127.0.0.1
instead.
Start the client (C++)
To start the C++ WebSocket client, use:
build/bin/sherpa-onnx-online-websocket-client \
--seconds-per-message=0.1 \
--server-port=6006 \
./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/0.wav
Since the server is able to process multiple clients at the same time, you can use the following command to start multiple clients:
for i in $(seq 0 10); do
k=$(expr $i % 5)
build/bin/sherpa-onnx-online-websocket-client \
--seconds-per-message=0.1 \
--server-port=6006 \
./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/${k}.wav &
done
wait
echo "done"
View the usage of the client (Python)
Use the following command to view the usage:
python3 ./python-api-examples/online-websocket-client-decode-file.py --help
Hint
online-websocket-client-decode-file.py
is from
https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/online-websocket-client-decode-file.py
It will print:
usage: online-websocket-client-decode-file.py [-h] [--server-addr SERVER_ADDR]
[--server-port SERVER_PORT]
[--samples-per-message SAMPLES_PER_MESSAGE]
[--seconds-per-message SECONDS_PER_MESSAGE]
sound_file
positional arguments:
sound_file The input sound file. Must be wave with a single
channel, 16kHz sampling rate, 16-bit of each sample.
optional arguments:
-h, --help show this help message and exit
--server-addr SERVER_ADDR
Address of the server (default: localhost)
--server-port SERVER_PORT
Port of the server (default: 6006)
--samples-per-message SAMPLES_PER_MESSAGE
Number of samples per message (default: 8000)
--seconds-per-message SECONDS_PER_MESSAGE
We will simulate that the duration of two messages is
of this value (default: 0.1)
Hint
For the Python client, you can use either a domain name or an IP address
for --server-addr
. For instance, you can use either
--server-addr localhost
or --server-addr 127.0.0.1
.
For the input argument, you can either use --key=value
or --key value
.
Start the client (Python)
python3 ./python-api-examples/online-websocket-client-decode-file.py \
--server-addr localhost \
--server-port 6006 \
--seconds-per-message 0.1 \
./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/4.wav
Start the client (Python, with microphone)
python3 ./python-api-examples/online-websocket-client-microphone.py \
--server-addr localhost \
--server-port 6006
``online-websocket-client-microphone.py `` is from
`<https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/online-websocket-client-microphone.py>`_