Pre-trained models
This section lists pre-trained models for adding punctuations to text.
You can find all models at the following URL:
sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12
This model is converted from
and it supports both Chinese and English.
Hint
If you want to know how the model is converted to sherpa-onnx, please download it and you can find related scripts in the downloaded model directory.
In the following, we describe how to download and use it with sherpa-onnx.
Download the model
Please use the following commands to download it:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/punctuation-models/sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
tar xvf sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
rm sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12.tar.bz2
You will find the following files after unzipping:
-rw-r--r-- 1 fangjun staff 1.4K Apr 12 12:32 README.md
-rwxr-xr-x 1 fangjun staff 1.6K Apr 12 14:40 add-model-metadata.py
-rw-r--r-- 1 fangjun staff 810B Apr 12 11:56 config.yaml
-rw-r--r-- 1 fangjun staff 42B Apr 12 11:45 configuration.json
-rw-r--r-- 1 fangjun staff 281M Apr 12 14:40 model.onnx
-rwxr-xr-x 1 fangjun staff 745B Apr 12 11:53 show-model-input-output.py
-rwxr-xr-x 1 fangjun staff 4.9K Apr 13 18:45 test.py
-rw-r--r-- 1 fangjun staff 4.0M Apr 12 11:56 tokens.json
Only model.onnx
is needed in sherpa-onnx. All other files are for your information about
how the model is converted to sherpa-onnx.
C++ binary examples
After installing sherpa-onnx, you can use the following command to add punctuations to text:
./bin/sherpa-onnx-offline-punctuation \
--ct-transformer=./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx \
"我们都是木头人不会说话不会动"
The output is given below:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./bin/sherpa-onnx-offline-punctuation --ct-transformer=./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx '我们都是木头人不会说话不会动'
OfflinePunctuationConfig(model=OfflinePunctuationModelConfig(ct_transformer="./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx", num_threads=1, debug=False, provider="cpu"))
Creating OfflinePunctuation ...
Started
Done
Num threads: 1
Elapsed seconds: 0.007 s
Input text: 我们都是木头人不会说话不会动
Output text: 我们都是木头人,不会说话不会动。
The second example is for text containing both Chinese and English:
./bin/sherpa-onnx-offline-punctuation \
--ct-transformer=./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx \
"这是一个测试你好吗How are you我很好thank you are you ok谢谢你"
Its output is given below:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./bin/sherpa-onnx-offline-punctuation --ct-transformer=./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx '这是一个测试你好吗How are you我很好thank you are you ok谢谢你'
OfflinePunctuationConfig(model=OfflinePunctuationModelConfig(ct_transformer="./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx", num_threads=1, debug=False, provider="cpu"))
Creating OfflinePunctuation ...
Started
Done
Num threads: 1
Elapsed seconds: 0.005 s
Input text: 这是一个测试你好吗How are you我很好thank you are you ok谢谢你
Output text: 这是一个测试,你好吗?How are you?我很好?thank you,are you ok,谢谢你。
The last example is for text containing only English:
./bin/sherpa-onnx-offline-punctuation \
--ct-transformer=./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx \
"The African blogosphere is rapidly expanding bringing more voices online in the form of commentaries opinions analyses rants and poetry"
Its output is given below:
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./bin/sherpa-onnx-offline-punctuation --ct-transformer=./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx 'The African blogosphere is rapidly expanding bringing more voices online in the form of commentaries opinions analyses rants and poetry'
OfflinePunctuationConfig(model=OfflinePunctuationModelConfig(ct_transformer="./sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12/model.onnx", num_threads=1, debug=False, provider="cpu"))
Creating OfflinePunctuation ...
Started
Done
Num threads: 1
Elapsed seconds: 0.003 s
Input text: The African blogosphere is rapidly expanding bringing more voices online in the form of commentaries opinions analyses rants and poetry
Output text: The African blogosphere is rapidly expanding,bringing more voices online in the form of commentaries,opinions,analyses,rants and poetry。
Python API examples
Please see
Huggingface space examples
Please see
Hint
For Chinese users, please visit the following mirrors:
Video demos
The following video is in Chinese.