sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 (English, 英语)
This model is converted from
You can find the conversion script at
In the following, we describe how to download it and use it with sherpa-onnx.
Hint
This model supports punctuations and cases.
Android APK for real-time speech recognition
Please visit https://k2-fsa.github.io/sherpa/onnx/android/apk-simulate-streaming-asr.html
and search for parakeet_tdt_0.6b_v2.
Hint
Please always use the latest version. For instance, you can use
sherpa-onnx-1.12.40-arm64-v8a-simulated_streaming_asr-en-parakeet_tdt_0.6b_v2.apk
for arm64-v8a.
Download the model
Please use the following commands to download it.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
tar xvf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
Hint
If you want to try float16 quantized model, please use sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2.
If you want to try non-quantized decoder and joiner models, please use sherpa-onnx-nemo-parakeet-tdt-0.6b-v2.tar.bz2
You should see something like below after downloading:
ls -lh sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/
total 1295752
-rw-r--r-- 1 fangjun staff 6.9M May 6 16:24 decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 622M May 6 16:24 encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 1.7M May 6 16:24 joiner.int8.onnx
drwxr-xr-x 3 fangjun staff 96B May 6 16:24 test_wavs
-rw-r--r-- 1 fangjun staff 9.2K May 6 16:24 tokens.txt
Decode wave files
Hint
It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
Note
Please use ./build/bin/Release/sherpa-onnx-offline.exe for Windows.
You should see the following output:
Real-time/Streaming Speech recognition from a microphone with VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-simulated-streaming-asr \
--silero-vad-model=./silero_vad.onnx \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer
Speech recognition from a microphone
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone-offline \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer
Speech recognition from a microphone with VAD
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
./build/bin/sherpa-onnx-vad-microphone-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer
Decode a long audio file with VAD
The following examples show how to decode a very long audio file with the help of VAD.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/Obama.wav
./build/bin/sherpa-onnx-vad-with-offline-asr \
--silero-vad-model=./silero_vad.onnx \
--silero-vad-threshold=0.2 \
--silero-vad-min-speech-duration=0.2 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./Obama.wav
| Wave filename | Content |
|---|---|
| Obama.wav |
You should see the following output:
Hint
If you want to use a GUI version and want to export SRT format, please visit
https://k2-fsa.github.io/sherpa/onnx/tauri/app/vad-asr-file.html and search for
en-parakeet_tdt. Please always use the latest version.
RTF on RK3588 with Cortex A76 CPU
In the following, we test this model on RK3588 with Cortex A76 CPU.
Information about the CPUs on the board is given below:
You can see that it has 8 CPUs: 4 Cortex A55 + 4 Cortex A76.
We use taskset below to test the RTF on Cortex A76.
taskset 0x80 sherpa-onnx-offline \
--num-threads=1 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
Its output is given below:
To test the RTF with different --num-threads, we use:
taskset 0xc0 sherpa-onnx-offline \
--num-threads=2 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
taskset 0xe0 sherpa-onnx-offline \
--num-threads=3 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
taskset 0xf0 sherpa-onnx-offline \
--num-threads=4 \
--encoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/encoder.int8.onnx \
--decoder=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/decoder.int8.onnx \
--joiner=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/joiner.int8.onnx \
--tokens=./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/tokens.txt \
--model-type=nemo_transducer \
./sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/test_wavs/0.wav
The results are summarized below:
Number of threads |
1 |
2 |
3 |
4 |
RTF on Cortex A76 CPU |
0.220 |
0.142 |
0.118 |
0.088 |