Go API
In this section, we describe how to use the Go API of sherpa-onnx.
The Go API of sherpa-onnx supports both streaming and non-streaming speech recognition.
The following table lists some Go API examples:
Description |
URL |
Decode a file with non-streaming models |
https://github.com/k2-fsa/sherpa-onnx/tree/master/go-api-examples/non-streaming-decode-files |
Decode a file with streaming models |
https://github.com/k2-fsa/sherpa-onnx/tree/master/go-api-examples/streaming-decode-files |
Real-time speech recognition from a |
One thing to note is that we have provided pre-built libraries for Go so that you don’t need to build sherpa-onnx by yourself when using the Go API.
To make supporting multiple platforms easier, we split the Go API of sherpa-onnx into multiple packages, as listed in the following table:
OS |
Package name |
Supported Arch |
Doc |
Linux |
|
||
macOS |
|
||
Windows |
|
To simplify the usage, we have provided a single Go package for sherpa-onnx that supports multiple operating systems. It can be found at
You can use the following import
to import sherpa-onnx-go
into your Go project:
import (
sherpa "github.com/k2-fsa/sherpa-onnx-go/sherpa_onnx"
)
In the following, we describe how to run our provided Go API examples.
Note
Before you continue, please make sure you have installed Go. If not, please follow https://go.dev/doc/install to install Go.
Hint
You need to enable cgo to build sherpa-onnx-go.
Decode files with non-streaming models
First, let us build the example:
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx/go-api-examples/non-streaming-decode-files
go mod tidy
go build
./non-streaming-decode-files --help
You will find the following output:
Usage of ./non-streaming-decode-files:
--debug int Whether to show debug message
--decoder string Path to the decoder model
--decoding-method string Decoding method. Possible values: greedy_search, modified_beam_search (default "greedy_search")
--encoder string Path to the encoder model
--joiner string Path to the joiner model
--lm-model string Optional. Path to the LM model
--lm-scale float32 Optional. Scale for the LM model (default 1)
--max-active-paths int Used only when --decoding-method is modified_beam_search (default 4)
--model-type string Optional. Used for loading the model in a faster way
--nemo-ctc string Path to the NeMo CTC model
--num-threads int Number of threads for computing (default 1)
--paraformer string Path to the paraformer model
--provider string Provider to use (default "cpu")
--tokens string Path to the tokens file
pflag: help requested
Congratulations! You have successfully built your first Go API example for speech recognition.
Note
If you are using Windows and don’t see any output after running ./non-streaming-decode-files --help
,
please copy *.dll
from https://github.com/k2-fsa/sherpa-onnx-go-windows/tree/master/lib/x86_64-pc-windows-gnu (for Win64)
or https://github.com/k2-fsa/sherpa-onnx-go-windows/tree/master/lib/i686-pc-windows-gnu (for Win32)
to the directory sherpa-onnx/go-api-examples/non-streaming-decode-files
.
Now let us refer to Pre-trained models to download a non-streaming model.
We give several examples below for demonstration.
Non-streaming transducer
We will use csukuangfj/sherpa-onnx-zipformer-en-2023-06-26 (English) as an example.
First, let us download it:
cd sherpa-onnx/go-api-examples/non-streaming-decode-files
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-en-2023-06-26.tar.bz2
tar xvf sherpa-onnx-zipformer-en-2023-06-26.tar.bz2
rm sherpa-onnx-zipformer-en-2023-06-26.tar.bz2
Now we can use:
./non-streaming-decode-files \
--encoder ./sherpa-onnx-zipformer-en-2023-06-26/encoder-epoch-99-avg-1.onnx \
--decoder ./sherpa-onnx-zipformer-en-2023-06-26/decoder-epoch-99-avg-1.onnx \
--joiner ./sherpa-onnx-zipformer-en-2023-06-26/joiner-epoch-99-avg-1.onnx \
--tokens ./sherpa-onnx-zipformer-en-2023-06-26/tokens.txt \
--model-type transducer \
./sherpa-onnx-zipformer-en-2023-06-26/test_wavs/0.wav
It should give you the following output:
2023/08/10 14:52:48.723098 Reading ./sherpa-onnx-zipformer-en-2023-06-26/test_wavs/0.wav
2023/08/10 14:52:48.741042 Initializing recognizer (may take several seconds)
2023/08/10 14:52:51.998848 Recognizer created!
2023/08/10 14:52:51.998870 Start decoding!
2023/08/10 14:52:52.258818 Decoding done!
2023/08/10 14:52:52.258847 after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
2023/08/10 14:52:52.258952 Wave duration: 6.625 seconds
Non-streaming paraformer
We will use csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28 (Chinese + English) as an example.
First, let us download it:
cd sherpa-onnx/go-api-examples/non-streaming-decode-files
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
rm sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
Now we can use:
./non-streaming-decode-files \
--paraformer ./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx \
--tokens ./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
--model-type paraformer \
./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
It should give you the following output:
2023/08/10 15:07:10.745412 Reading ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
2023/08/10 15:07:10.758414 Initializing recognizer (may take several seconds)
2023/08/10 15:07:13.992424 Recognizer created!
2023/08/10 15:07:13.992441 Start decoding!
2023/08/10 15:07:14.382157 Decoding done!
2023/08/10 15:07:14.382847 对我做了介绍啊那么我想说的是呢大家如果对我的研究感兴趣呢你
2023/08/10 15:07:14.382898 Wave duration: 5.614625 seconds
Non-streaming CTC model from NeMo
We will use stt_en_conformer_ctc_medium as an example.
First, let us download it:
cd sherpa-onnx/go-api-examples/non-streaming-decode-files
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2
tar xvf sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2
rm sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2
Now we can use:
./non-streaming-decode-files \
--nemo-ctc ./sherpa-onnx-nemo-ctc-en-conformer-medium/model.onnx \
--tokens ./sherpa-onnx-nemo-ctc-en-conformer-medium/tokens.txt \
--model-type nemo_ctc \
./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav
It should give you the following output:
2023/08/10 15:11:48.667693 Reading ./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav
2023/08/10 15:11:48.680855 Initializing recognizer (may take several seconds)
2023/08/10 15:11:51.900852 Recognizer created!
2023/08/10 15:11:51.900869 Start decoding!
2023/08/10 15:11:52.125605 Decoding done!
2023/08/10 15:11:52.125630 after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
2023/08/10 15:11:52.125645 Wave duration: 6.625 seconds
Decode files with streaming models
First, let us build the example:
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx/go-api-examples/streaming-decode-files
go mod tidy
go build
./streaming-decode-files --help
You will find the following output:
Usage of ./streaming-decode-files:
--debug int Whether to show debug message
--decoder string Path to the decoder model
--decoding-method string Decoding method. Possible values: greedy_search, modified_beam_search (default "greedy_search")
--encoder string Path to the encoder model
--joiner string Path to the joiner model
--max-active-paths int Used only when --decoding-method is modified_beam_search (default 4)
--model-type string Optional. Used for loading the model in a faster way
--num-threads int Number of threads for computing (default 1)
--provider string Provider to use (default "cpu")
--tokens string Path to the tokens file
pflag: help requested
Note
If you are using Windows and don’t see any output after running ./streaming-decode-files --help
,
please copy *.dll
from https://github.com/k2-fsa/sherpa-onnx-go-windows/tree/master/lib/x86_64-pc-windows-gnu (for Win64)
or https://github.com/k2-fsa/sherpa-onnx-go-windows/tree/master/lib/i686-pc-windows-gnu (for Win32)
to the directory sherpa-onnx/go-api-examples/streaming-decode-files
.
Now let us refer to Pre-trained models to download a streaming model.
We give one example below for demonstration.
Streaming transducer
We will use csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26 (English) as an example.
First, let us download it:
cd sherpa-onnx/go-api-examples/streaming-decode-files
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
tar xvf sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
rm sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
Now we can use:
./streaming-decode-files \
--encoder ./sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx \
--decoder ./sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx \
--joiner ./sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx \
--tokens ./sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
--model-type zipformer2 \
./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/0.wav
It should give you the following output:
2023/08/10 15:17:00.226228 Reading ./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/0.wav
2023/08/10 15:17:00.241024 Initializing recognizer (may take several seconds)
2023/08/10 15:17:03.352697 Recognizer created!
2023/08/10 15:17:03.352711 Start decoding!
2023/08/10 15:17:04.057130 Decoding done!
2023/08/10 15:17:04.057215 after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
2023/08/10 15:17:04.057235 Wave duration: 6.625 seconds
Real-time speech recognition from microphone
Hint
You need to install portaudio
for this example.
# for macOS
brew install portaudio
export PKG_CONFIG_PATH=/usr/local/Cellar/portaudio/19.7.0
# for Ubuntu
sudo apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0
To check that you have installed portaudio
successfully, please run:
pkg-config --cflags --libs portaudio-2.0
It should give you something like below:
# for macOS -I/usr/local/Cellar/portaudio/19.7.0/include -L/usr/local/Cellar/portaudio/19.7.0/lib -lportaudio -framework CoreAudio -framework AudioToolbox -framework AudioUnit -framework CoreFoundation -framework CoreServices # for Ubuntu -pthread -lportaudio -lasound -lm -lpthread
First, let us build the example:
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx/go-api-examples/real-time-speech-recognition-from-microphone
go mod tidy
go build
./real-time-speech-recognition-from-microphone --help
You will find the following output:
Select default input device: MacBook Pro Microphone
Usage of ./real-time-speech-recognition-from-microphone:
--debug int Whether to show debug message
--decoder string Path to the decoder model
--decoding-method string Decoding method. Possible values: greedy_search, modified_beam_search (default "greedy_search")
--enable-endpoint int Whether to enable endpoint (default 1)
--encoder string Path to the encoder model
--joiner string Path to the joiner model
--max-active-paths int Used only when --decoding-method is modified_beam_search (default 4)
--model-type string Optional. Used for loading the model in a faster way
--num-threads int Number of threads for computing (default 1)
--provider string Provider to use (default "cpu")
--rule1-min-trailing-silence float32 Threshold for rule1 (default 2.4)
--rule2-min-trailing-silence float32 Threshold for rule2 (default 1.2)
--rule3-min-utterance-length float32 Threshold for rule3 (default 20)
--tokens string Path to the tokens file
pflag: help requested
Now let us refer to Pre-trained models to download a streaming model.
We give one example below for demonstration.
Streaming transducer
We will use csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26 (English) as an example.
First, let us download it:
cd sherpa-onnx/go-api-examples/real-time-speech-recognition-from-microphone
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
tar xvf sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
rm sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
Now we can use:
./real-time-speech-recognition-from-microphone \
--encoder ./sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx \
--decoder ./sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx \
--joiner ./sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx \
--tokens ./sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
--model-type zipformer2
It should give you the following output:
Select default input device: MacBook Pro Microphone
2023/08/10 15:22:00 Initializing recognizer (may take several seconds)
2023/08/10 15:22:03 Recognizer created!
Started! Please speak
0: this is the first test
1: this is the second
colab
We provide a colab notebook for you to try the Go API examples of sherpa-onnx.