Pre-trained models

In this section, we describe how to download and use all available keyword spotting pre-trained models.

Hint

Please install git-lfs before you continue.

Otherwise, you will be SAD later.

Hint

Please refer to Installation to install sherpa-onnx before you read this section.

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01 (Chinese)

Training code for this model can be found at https://github.com/k2-fsa/icefall/pull/1428. The model is trained on WenetSpeech L subset (10000 hours), it supports only Chinese.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
tar xf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
ls -lh sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01

The output is given below:

$ ls -lh sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
total 18M
-rw-r--r--  1 kangwei root   48 Jan  1 21:45 configuration.json
-rw-r--r--  1 kangwei root 177K Jan 17 11:38 decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx
-rw-r--r--  1 kangwei root 660K Jan  1 21:45 decoder-epoch-12-avg-2-chunk-16-left-64.onnx
-rw-r--r--  1 kangwei root 4.6M Jan 17 11:38 encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx
-rw-r--r--  1 kangwei root  12M Jan  1 21:45 encoder-epoch-12-avg-2-chunk-16-left-64.onnx
-rw-r--r--  1 kangwei root  64K Jan 17 11:38 joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx
-rw-r--r--  1 kangwei root 248K Jan  1 21:45 joiner-epoch-12-avg-2-chunk-16-left-64.onnx
-rw-r--r--  1 kangwei root  101 Jan  8 17:14 keywords_raw.txt
-rw-r--r--  1 kangwei root  286 Jan  8 17:14 keywords.txt
-rw-r--r--  1 kangwei root  750 Jan  8 17:14 README.md
drwxr-xr-x 10 kangwei root    0 Jan 15 22:52 test_wavs
-rw-r--r--  1 kangwei root 1.6K Jan  1 21:45 tokens.txt

Test the model

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-keyword-spotter \
  --encoder=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.onnx \
  --decoder=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.onnx \
  --joiner=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.onnx \
  --tokens=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt \
  --keywords-file=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt  \
  sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav \
  sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/4.wav \
  sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/5.wav

Note

Please use ./build/bin/Release/sherpa-onnx-keyword-spotter.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="sherpa-on$x-kws-zipformer-wenetspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.onnx", decoder="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk$16-left-64.onnx", joiner="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.onnx"), paraformer=OnlineParaformerModelConfig(encoder="", deco$er=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="sherpa-onnx-kws-zipformer-we$etspeech-3.3M-2024-01-01/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), max_active_paths=4, num_trailing_blanks=1, keywords_score=1, keywords_threshold=0.25 keywords_file="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt")

2024-01-19 12:32:29.983790275 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3385848, index: 15, mask: {16, 52, }, error code: 22 error msg: Invali$
 argument. Specify the number of threads explicitly so the affinity is not set.
2024-01-19 12:32:29.983792055 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3385849, index: 16, mask: {17, 53, }, error code: 22 error msg: Invali$
 argument. Specify the number of threads explicitly so the affinity is not set.
sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/4.wav
{"start_time":0.00, "keyword": "蒋友伯", "timestamps": [0.64, 0.68, 0.84, 0.96, 1.12, 1.16], "tokens":["j", "iǎng", "y", "ǒu", "b", "ó"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/5.wav
{"start_time":0.00, "keyword": "周望军", "timestamps": [0.64, 0.68, 0.76, 0.84, 1.00, 1.04], "tokens":["zh", "ōu", "w", "àng", "j", "ūn"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav
{"start_time":0.00, "keyword": "文森特卡索", "timestamps": [0.32, 0.72, 0.96, 1.00, 1.20, 1.32, 1.48, 1.60, 1.88, 1.92], "tokens":["w", "én", "s", "ēn", "t", "è", "k", "ǎ", "s", "uǒ"$
}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/5.wav
{"start_time":0.00, "keyword": "落实", "timestamps": [1.76, 1.92, 2.12, 2.20], "tokens":["l", "uò", "sh", "í"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/4.wav
{"start_time":0.00, "keyword": "女儿", "timestamps": [3.08, 3.20, 3.24], "tokens":["n", "ǚ", "ér"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav
{"start_time":0.00, "keyword": "法国", "timestamps": [4.56, 4.64, 4.80, 4.88], "tokens":["f", "ǎ", "g", "uó"]}

int8

The following code shows how to use int8 models:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-keyword-spotter \
  --encoder=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx \
  --decoder=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx \
  --joiner=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx \
  --tokens=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt \
  --keywords-file=sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt  \
  sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav \
  sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/4.wav \
  sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/5.wav
KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx", decoder="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx", joiner="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx"), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), max_active_paths=4, num_trailing_blanks=1, keywords_score=1, keywords_threshold=0.25, keywords_file="sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt")

2024-01-19 12:36:44.635979490 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3391918, index: 15, mask: {16, 52, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-01-19 12:36:44.635981379 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3391919, index: 16, mask: {17, 53, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/4.wav
{"start_time":0.00, "keyword": "蒋友伯", "timestamps": [0.64, 0.68, 0.84, 0.96, 1.12, 1.16], "tokens":["j", "iǎng", "y", "ǒu", "b", "ó"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/5.wav
{"start_time":0.00, "keyword": "周望军", "timestamps": [0.64, 0.68, 0.76, 0.84, 1.00, 1.08], "tokens":["zh", "ōu", "w", "àng", "j", "ūn"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav
{"start_time":0.00, "keyword": "文森特卡索", "timestamps": [0.32, 0.72, 0.96, 1.04, 1.28, 1.32, 1.52, 1.60, 1.92, 1.96], "tokens":["w", "én", "s", "ēn", "t", "è", "k", "ǎ", "s", "uǒ"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/5.wav
{"start_time":0.00, "keyword": "落实", "timestamps": [1.80, 1.92, 2.12, 2.20], "tokens":["l", "uò", "sh", "í"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/4.wav
{"start_time":0.00, "keyword": "女儿", "timestamps": [3.08, 3.20, 3.24], "tokens":["n", "ǚ", "ér"]}

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/3.wav
{"start_time":0.00, "keyword": "法国", "timestamps": [4.56, 4.64, 4.80, 4.88], "tokens":["f", "ǎ", "g", "uó"]}

Customize your own keywords

To customize your own keywords, the only thing you need to do is replacing the --keywords-file. The keywords file is generated as follows:

For example your keywords are (keywords_raw.txt):

你好军哥 @你好军哥
你好问问 @你好问问
小爱同学 @小爱同学

Run the following command:

sherpa-onnx-cli text2token \
  --tokens sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt \
  --tokens-type ppinyin \
  keywords_raw.txt keywords.txt

The keywords.txt looks like:

n ǐ h ǎo j ūn g ē @你好军哥
n ǐ h ǎo w èn w èn @你好问问
x iǎo ài t óng x  @小爱同学

Note

If you install sherpa-onnx from sources (i.e. not by pip), you can use the alternative script in scripts, the usage is almost the same as the command line tool, read the help information by:

python3 scripts/text2token.py --help

sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01 (English)

Training code for this model can be found at https://github.com/k2-fsa/icefall/pull/1428. The model is trained on GigaSpeech XL subset (10000 hours), it supports only English.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2
tar xvf sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2
rm sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2
ls -lh sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01

The output is given below:

$ ls -lh sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01
total 19M
-rw-r--r-- 1 kangwei root 240K Jan 19 15:25 bpe.model
-rw-r--r-- 1 kangwei root   48 Jan 19 15:25 configuration.json
-rw-r--r-- 1 kangwei root 272K Jan 19 15:25 decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx
-rw-r--r-- 1 kangwei root 1.1M Jan 19 15:25 decoder-epoch-12-avg-2-chunk-16-left-64.onnx
-rw-r--r-- 1 kangwei root 4.6M Jan 19 15:25 encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx
-rw-r--r-- 1 kangwei root  12M Jan 19 15:25 encoder-epoch-12-avg-2-chunk-16-left-64.onnx
-rw-r--r-- 1 kangwei root 160K Jan 19 15:25 joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx
-rw-r--r-- 1 kangwei root 628K Jan 19 15:25 joiner-epoch-12-avg-2-chunk-16-left-64.onnx
-rw-r--r-- 1 kangwei root  102 Jan 19 15:25 keywords_raw.txt
-rw-r--r-- 1 kangwei root  184 Jan 19 15:25 keywords.txt
-rw-r--r-- 1 kangwei root  743 Jan 19 15:25 README.md
drwxr-xr-x 6 kangwei root    0 Jan 19 15:25 test_wavs
-rw-r--r-- 1 kangwei root 4.9K Jan 19 15:25 tokens.txt

Test the model

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-keyword-spotter \
  --encoder=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.onnx \
  --decoder=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.onnx \
  --joiner=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.onnx \
  --tokens=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/tokens.txt \
  --keywords-file=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt  \
  sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/0.wav \
  sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/1.wav

Note

Please use ./build/bin/Release/sherpa-onnx-keyword-spotter.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.onnx", decoder="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.onnx", joiner="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.onnx"), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), max_active_paths=4, num_trailing_blanks=1, keywords_score=1, keywords_threshold=0.25, keywords_file="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt")
2024-01-19 15:32:46.420331393 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3492733, index: 16, mask: {17, 53, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-01-19 15:32:46.420332978 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3492732, index: 15, mask: {16, 52, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/0.wav
{"start_time":0.00, "keyword": "LIGHT UP", "timestamps": [3.04, 3.08, 3.12, 3.20], "tokens":[" ", "L", "IGHT", " UP"]}

sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/1.wav
{"start_time":0.00, "keyword": "LOVELY CHILD", "timestamps": [5.44, 5.56, 5.84, 6.00, 6.04], "tokens":[" LOVE", "LY", " CHI", "L", "D"]}

sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/1.wav
{"start_time":0.00, "keyword": "FOREVER", "timestamps": [10.88, 11.04, 11.08], "tokens":[" FOR", "E", "VER"]}

int8

The following code shows how to use int8 models:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-keyword-spotter \
  --encoder=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx \
  --decoder=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx \
  --joiner=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx \
  --tokens=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/tokens.txt \
  --keywords-file=sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt  \
  sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/0.wav \
  sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/1.wav
KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx", decoder="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx", joiner="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx"), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), max_active_paths=4, num_trailing_blanks=1, keywords_score=1, keywords_threshold=0.25, keywords_file="sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt")
2024-01-19 15:31:39.743344642 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3492115, index: 15, mask: {16, 52, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-01-19 15:31:39.743346583 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3492116, index: 16, mask: {17, 53, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/0.wav
{"start_time":0.00, "keyword": "LIGHT UP", "timestamps": [3.04, 3.08, 3.12, 3.16], "tokens":[" ", "L", "IGHT", " UP"]}

sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/1.wav
{"start_time":0.00, "keyword": "LOVELY CHILD", "timestamps": [5.36, 5.60, 5.84, 6.00, 6.04], "tokens":[" LOVE", "LY", " CHI", "L", "D"]}

sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/test_wavs/1.wav
{"start_time":0.00, "keyword": "FOREVER", "timestamps": [10.88, 11.04, 11.08], "tokens":[" FOR", "E", "VER"]}

Customize your own keywords

To customize your own keywords, the only thing you need to do is replacing the --keywords-file. The keywords file is generated as follows:

For example your keywords are (keywords_raw.txt):

HELLO WORLD
HI GOOGLE
HEY SIRI

Run the following command:

sherpa-onnx-cli text2token \
  --tokens sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/tokens.txt \
  --tokens-type bpe \
  --bpe-model sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01/bpe.model \
  keywords_raw.txt keywords.txt

The keywords.txt looks like:

▁HE LL O ▁WORLD
▁HI ▁GO O G LE
▁HE Y ▁S I RI

Note

If you install sherpa-onnx from sources (i.e. not by pip), you can use the alternative script in scripts, the usage is almost the same as the command line tool, read the help information by:

python3 scripts/text2token.py --help