Pre-trained models
whisper
Currently, we support whisper multilingual models for spoken language identification.
Model type |
Huggingface repo |
|
|
|
|
|
|
|
https://huggingface.co/csukuangfj/sherpa-onnx-whisper-medium |
In the following, we use the tiny model as an example. You can
replace tiny with base, small, or medium and everything still holds.
Download the model
Please use the following commands to download the tiny model:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
tar xvf sherpa-onnx-whisper-tiny.tar.bz2
rm sherpa-onnx-whisper-tiny.tar.bz2
You should find the following files after unzipping:
-rw-r--r-- 1 fangjun staff 427B Jan 31 16:21 README.md
-rwxr-xr-x 1 fangjun staff 19K Jan 31 16:21 export-onnx.py
-rw-r--r-- 1 fangjun staff 15B Jan 31 16:21 requirements.txt
-rwxr-xr-x 1 fangjun staff 12K Jan 31 16:21 test.py
drwxr-xr-x 6 fangjun staff 192B Jan 31 16:22 test_wavs
-rw-r--r-- 1 fangjun staff 86M Jan 31 16:22 tiny-decoder.int8.onnx
-rw-r--r-- 1 fangjun staff 109M Jan 31 16:22 tiny-decoder.onnx
-rw-r--r-- 1 fangjun staff 12M Jan 31 16:22 tiny-encoder.int8.onnx
-rw-r--r-- 1 fangjun staff 36M Jan 31 16:22 tiny-encoder.onnx
-rw-r--r-- 1 fangjun staff 798K Jan 31 16:22 tiny-tokens.txt
Download test waves
Please use the following command to download test data:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/spoken-language-identification-test-wavs.tar.bz2
tar xvf spoken-language-identification-test-wavs.tar.bz2
rm spoken-language-identification-test-wavs.tar.bz2
You can find the following test files after unzipping:
-rw-r--r-- 1 fangjun staff 222K Mar 24 12:51 ar-arabic.wav
-rw-r--r--@ 1 fangjun staff 137K Mar 24 13:09 bg-bulgarian.wav
-rw-r--r-- 1 fangjun staff 83K Mar 24 13:07 cs-czech.wav
-rw-r--r-- 1 fangjun staff 112K Mar 24 13:07 da-danish.wav
-rw-r--r-- 1 fangjun staff 199K Mar 24 12:50 de-german.wav
-rw-r--r-- 1 fangjun staff 207K Mar 24 13:06 el-greek.wav
-rw-r--r-- 1 fangjun staff 31K Mar 24 12:45 en-english.wav
-rw-r--r--@ 1 fangjun staff 77K Mar 24 12:23 es-spanish.wav
-rw-r--r--@ 1 fangjun staff 371K Mar 24 12:21 fa-persian.wav
-rw-r--r-- 1 fangjun staff 136K Mar 24 13:08 fi-finnish.wav
-rw-r--r-- 1 fangjun staff 112K Mar 24 12:49 fr-french.wav
-rw-r--r-- 1 fangjun staff 179K Mar 24 12:47 hi-hindi.wav
-rw-r--r--@ 1 fangjun staff 177K Mar 24 12:29 hr-croatian.wav
-rw-r--r-- 1 fangjun staff 167K Mar 24 12:53 id-indonesian.wav
-rw-r--r-- 1 fangjun staff 136K Mar 24 12:54 it-italian.wav
-rw-r--r-- 1 fangjun staff 46K Mar 24 12:44 ja-japanese.wav
-rw-r--r--@ 1 fangjun staff 122K Mar 24 12:52 ko-korean.wav
-rw-r--r-- 1 fangjun staff 85K Mar 24 12:54 nl-dutch.wav
-rw-r--r--@ 1 fangjun staff 241K Mar 24 12:38 no-norwegian.wav
-rw-r--r--@ 1 fangjun staff 121K Mar 24 12:35 po-polish.wav
-rw-r--r-- 1 fangjun staff 166K Mar 24 12:48 pt-portuguese.wav
-rw-r--r--@ 1 fangjun staff 144K Mar 24 12:33 ro-romanian.wav
-rw-r--r-- 1 fangjun staff 111K Mar 24 12:51 ru-russian.wav
-rw-r--r--@ 1 fangjun staff 239K Mar 24 12:40 sk-slovak.wav
-rw-r--r-- 1 fangjun staff 196K Mar 24 13:01 sv-swedish.wav
-rw-r--r-- 1 fangjun staff 106K Mar 24 13:14 ta-tamil.wav
-rw-r--r-- 1 fangjun staff 104K Mar 24 13:02 tl-tagalog.wav
-rw-r--r-- 1 fangjun staff 76K Mar 24 13:00 tr-turkish.wav
-rw-r--r-- 1 fangjun staff 188K Mar 24 13:05 uk-ukrainian.wav
-rw-r--r-- 1 fangjun staff 181K Mar 24 13:20 zh-chinese.wav
Test with Python APIs
After installing sherpa-onnx either from source or from using pip install sherpa-onnx, you can run:
python3 ./python-api-examples/spoken-language-identification.py \
--whisper-encoder ./sherpa-onnx-whisper-tiny/tiny-encoder.int8.onnx \
--whisper-decoder ./sherpa-onnx-whisper-tiny/tiny-decoder.onnx \
./spoken-language-identification-test-wavs/de-german.wav
You should see the following output:
2024-04-17 15:53:23,104 INFO [spoken-language-identification.py:158] File: ./spoken-language-identification-test-wavs/de-german.wav
2024-04-17 15:53:23,104 INFO [spoken-language-identification.py:159] Detected language: de
2024-04-17 15:53:23,104 INFO [spoken-language-identification.py:160] Elapsed seconds: 0.275
2024-04-17 15:53:23,105 INFO [spoken-language-identification.py:161] Audio duration in seconds: 6.374
2024-04-17 15:53:23,105 INFO [spoken-language-identification.py:162] RTF: 0.275/6.374 = 0.043
Hint
You can find spoken-language-identification.py at
Android APKs
You can find pre-built Android APKs for spoken language identification at the following address:
Huggingface space
We provide a huggingface space for spoken language identification.
You can visit the following URL:
Note
For Chinese users, you can use the following mirror: