Pre-trained models
whisper
Currently, we support whisper multilingual models for spoken language identification.
Model type |
Huggingface repo |
|
|
|
|
|
|
|
https://huggingface.co/csukuangfj/sherpa-onnx-whisper-medium |
In the following, we use the tiny
model as an example. You can
replace tiny
with base
, small
, or medium
and everything still holds.
Download the model
Please use the following commands to download the tiny
model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
tar xvf sherpa-onnx-whisper-tiny.tar.bz2
rm sherpa-onnx-whisper-tiny.tar.bz2
You should find the following files after unzipping:
-rw-r--r-- 1 fangjun staff 427B Jan 31 16:21 README.md
-rwxr-xr-x 1 fangjun staff 19K Jan 31 16:21 export-onnx.py
-rw-r--r-- 1 fangjun staff 15B Jan 31 16:21 requirements.txt
-rwxr-xr-x 1 fangjun staff 12K Jan 31 16:21 test.py
drwxr-xr-x 6 fangjun staff 192B Jan 31 16:22 test_wavs
-rw-r--r-- 1 fangjun staff 86M Jan 31 16:22 tiny-decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 109M Jan 31 16:22 tiny-decoder.onnx
-rw-r--r-- 1 fangjun staff 12M Jan 31 16:22 tiny-encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 36M Jan 31 16:22 tiny-encoder.onnx
-rw-r--r-- 1 fangjun staff 798K Jan 31 16:22 tiny-tokens.txt
Download test waves
Please use the following command to download test data:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/spoken-language-identification-test-wavs.tar.bz2
tar xvf spoken-language-identification-test-wavs.tar.bz2
rm spoken-language-identification-test-wavs.tar.bz2
You can find the following test files after unzipping:
-rw-r--r-- 1 fangjun staff 222K Mar 24 12:51 ar-arabic.wav
-rw-r--r--@ 1 fangjun staff 137K Mar 24 13:09 bg-bulgarian.wav
-rw-r--r-- 1 fangjun staff 83K Mar 24 13:07 cs-czech.wav
-rw-r--r-- 1 fangjun staff 112K Mar 24 13:07 da-danish.wav
-rw-r--r-- 1 fangjun staff 199K Mar 24 12:50 de-german.wav
-rw-r--r-- 1 fangjun staff 207K Mar 24 13:06 el-greek.wav
-rw-r--r-- 1 fangjun staff 31K Mar 24 12:45 en-english.wav
-rw-r--r--@ 1 fangjun staff 77K Mar 24 12:23 es-spanish.wav
-rw-r--r--@ 1 fangjun staff 371K Mar 24 12:21 fa-persian.wav
-rw-r--r-- 1 fangjun staff 136K Mar 24 13:08 fi-finnish.wav
-rw-r--r-- 1 fangjun staff 112K Mar 24 12:49 fr-french.wav
-rw-r--r-- 1 fangjun staff 179K Mar 24 12:47 hi-hindi.wav
-rw-r--r--@ 1 fangjun staff 177K Mar 24 12:29 hr-croatian.wav
-rw-r--r-- 1 fangjun staff 167K Mar 24 12:53 id-indonesian.wav
-rw-r--r-- 1 fangjun staff 136K Mar 24 12:54 it-italian.wav
-rw-r--r-- 1 fangjun staff 46K Mar 24 12:44 ja-japanese.wav
-rw-r--r--@ 1 fangjun staff 122K Mar 24 12:52 ko-korean.wav
-rw-r--r-- 1 fangjun staff 85K Mar 24 12:54 nl-dutch.wav
-rw-r--r--@ 1 fangjun staff 241K Mar 24 12:38 no-norwegian.wav
-rw-r--r--@ 1 fangjun staff 121K Mar 24 12:35 po-polish.wav
-rw-r--r-- 1 fangjun staff 166K Mar 24 12:48 pt-portuguese.wav
-rw-r--r--@ 1 fangjun staff 144K Mar 24 12:33 ro-romanian.wav
-rw-r--r-- 1 fangjun staff 111K Mar 24 12:51 ru-russian.wav
-rw-r--r--@ 1 fangjun staff 239K Mar 24 12:40 sk-slovak.wav
-rw-r--r-- 1 fangjun staff 196K Mar 24 13:01 sv-swedish.wav
-rw-r--r-- 1 fangjun staff 106K Mar 24 13:14 ta-tamil.wav
-rw-r--r-- 1 fangjun staff 104K Mar 24 13:02 tl-tagalog.wav
-rw-r--r-- 1 fangjun staff 76K Mar 24 13:00 tr-turkish.wav
-rw-r--r-- 1 fangjun staff 188K Mar 24 13:05 uk-ukrainian.wav
-rw-r--r-- 1 fangjun staff 181K Mar 24 13:20 zh-chinese.wav
Test with Python APIs
After installing sherpa-onnx either from source or from using pip install sherpa-onnx
, you can run:
python3 ./python-api-examples/spoken-language-identification.py \
--whisper-encoder ./sherpa-onnx-whisper-tiny/tiny-encoder.int8.onnx \
--whisper-decoder ./sherpa-onnx-whisper-tiny/tiny-decoder.onnx \
./spoken-language-identification-test-wavs/de-german.wav
You should see the following output:
2024-04-17 15:53:23,104 INFO [spoken-language-identification.py:158] File: ./spoken-language-identification-test-wavs/de-german.wav
2024-04-17 15:53:23,104 INFO [spoken-language-identification.py:159] Detected language: de
2024-04-17 15:53:23,104 INFO [spoken-language-identification.py:160] Elapsed seconds: 0.275
2024-04-17 15:53:23,105 INFO [spoken-language-identification.py:161] Audio duration in seconds: 6.374
2024-04-17 15:53:23,105 INFO [spoken-language-identification.py:162] RTF: 0.275/6.374 = 0.043
Hint
You can find spoken-language-identification.py
at
Android APKs
You can find pre-built Android APKs for spoken language identification at the following address:
Huggingface space
We provide a huggingface space for spoken language identification.
You can visit the following URL:
Note
For Chinese users, you can use the following mirror: