Keyword spotting
In this section, we describe how we implement the open vocabulary keyword spotting (aka customized keyword spotting) feature and how to use it in sherpa-onnx.
What is open vocabulary keyword spotting
Basically, an open vocabulary keyword spotting system is just like a tiny ASR system, but it can only decode words/phrases
in the given keywords. For example, if the given keyword is HELLO WORLD
, then the decoded result should be either
HELLO WORLD
or empty. As for open vocabulary (or customized), it means you can specify any keywords without re-training
the model. For building a conventional keyword spotting systems, people need to prepare a lot of audio-text pairs for the selected keywords
and the trained model can only be used to detect those selected keywords.
While an open vocabulary keyword spotting system allows people using one system to detect different keywords, even the keywords
might not be in the training data.
Decoder for open vocabulary keyword spotting
For now, we only implement a beam search decoder to make the system only trigger the given keywords (i.e. the model itself is actually a tiny ASR).
To make it is able to balance between the trigged rate
and false alarm
, we introduce two parameters for each keyword, boosting score
and trigger threshold
. The boosting score
works like the hotwords recognition, it help the paths containing keywords to survive beam
search, the larger this score is the easier the corresponding keyword will be triggered, read Hotwords (Contextual biasing) for more details.
The trigger threshold
defines the minimum acoustic probability of decoded sequences (token sequences) that can be triggered, it is a float
value between 0 to 1, the lower this threshold is the easier the corresponding keyword will be triggered.
Keywords file
The input keywords looks like (the keywords are HELLO WORLD
, HI GOOGLE
and HEY SIRI
):
▁HE LL O ▁WORLD :1.5 #0.35
▁HI ▁GO O G LE :1.0 #0.25
▁HE Y ▁S I RI
Each line contains a keyword, the first several tokens (separated by spaces) are encoded tokens of the keyword, the item starts with :
is the boosting score
and the item starts with #
is the trigger threshold
. Note: No spaces between :
(or #
) and the float value.
To get the tokens you need to use the command line tool in sherpa-onnx to convert the original keywords, you can see the usage as follows:
# Note: You need to run pip install sherpa-onnx to get the commandline tool: sherpa-onnx-cli
sherpa-onnx-cli text2token --help
Usage: sherpa-onnx-cli text2token [OPTIONS] INPUT OUTPUT
Options:
--text TEXT Path to the input texts. Each line in the texts contains the original phrase, it might also contain some extra items,
for example, the boosting score (startting with :), the triggering threshold
(startting with #, only used in keyword spotting task) and the original phrase (startting with @).
Note: extra items will be kept in the output.
example input 1 (tokens_type = ppinyin):
小爱同学 :2.0 #0.6 @小爱同学
你好问问 :3.5 @你好问问
小艺小艺 #0.6 @小艺小艺
example output 1:
x iǎo ài tóng x ué :2.0 #0.6 @小爱同学
n ǐ h ǎo w èn w èn :3.5 @你好问问
x iǎo y ì x iǎo y ì #0.6 @小艺小艺
example input 2 (tokens_type = bpe):
HELLO WORLD :1.5 #0.4
HI GOOGLE :2.0 #0.8
HEY SIRI #0.35
example output 2:
▁HE LL O ▁WORLD :1.5 #0.4
▁HI ▁GO O G LE :2.0 #0.8
▁HE Y ▁S I RI #0.35
--tokens TEXT The path to tokens.txt.
--tokens-type TEXT The type of modeling units, should be cjkchar, bpe, cjkchar+bpe, fpinyin or ppinyin.
fpinyin means full pinyin, each cjkchar has a pinyin(with tone). ppinyin
means partial pinyin, it splits pinyin into initial and final,
--bpe-model TEXT The path to bpe.model. Only required when tokens-type is bpe or cjkchar+bpe.
--help Show this message and exit.
Note
If the tokens-type is fpinyin
or ppinyin
, you MUST provide the original keyword (starting with @
).
Note
If you install sherpa-onnx from sources (i.e. not by pip), you can use the alternative script in scripts, the usage is almost the same as the command line tool, read the help information by:
python3 scripts/text2token.py --help
How to use keyword spotting in sherpa-onnx
Currently, we provide command-line tool and android app for keyword spotting.
command-line tool
After installing sherpa-onnx, type sherpa-onnx-keyword-spotter --help
for the help message.
Android application
You can find pre-built Android APKs for keyword spotting at
Here is a demo video (Note: It is in Chinese).
Pretrained models
You can find the pre-trained models in Pre-trained models.