Kokoro
This page lists pre-trained models from https://huggingface.co/hexgrad/Kokoro-82M.
kokoro-multi-lang-v1_0 (Chinese + English, 53 speakers)
This model contains 53 speakers. The ONNX model is from https://github.com/taylorchu/kokoro-onnx/releases/tag/v0.2.0
Hint
If you want to convert the kokoro 1.0 onnx model to sherpa-onnx, please see https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/kokoro/v1.0/run.sh
This model in sherpa-onnx supports both English and Chinese.
In the following, we describe how to download it and use it with sherpa-onnx.
Warning
It is a multi-lingual model, but we only add English and Chinese support for it.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-multi-lang-v1_0.tar.bz2
tar xf kokoro-multi-lang-v1_0.tar.bz2
rm kokoro-multi-lang-v1_0.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
ls -lh kokoro-multi-lang-v1_0/
total 718872
-rw-r--r-- 1 fangjun staff 11K Feb 7 10:16 LICENSE
-rw-r--r-- 1 fangjun staff 50B Feb 7 10:18 README.md
-rw-r--r-- 1 fangjun staff 58K Feb 7 10:18 date-zh.fst
drwxr-xr-x 9 fangjun staff 288B Apr 19 2024 dict
drwxr-xr-x 122 fangjun staff 3.8K Nov 28 2023 espeak-ng-data
-rw-r--r-- 1 fangjun staff 6.0M Feb 7 10:18 lexicon-gb-en.txt
-rw-r--r-- 1 fangjun staff 5.6M Feb 7 10:18 lexicon-us-en.txt
-rw-r--r-- 1 fangjun staff 2.3M Feb 7 10:18 lexicon-zh.txt
-rw-r--r-- 1 fangjun staff 310M Feb 7 10:18 model.onnx
-rw-r--r-- 1 fangjun staff 63K Feb 7 10:18 number-zh.fst
-rw-r--r-- 1 fangjun staff 87K Feb 7 10:18 phone-zh.fst
-rw-r--r-- 1 fangjun staff 687B Feb 7 10:18 tokens.txt
-rw-r--r-- 1 fangjun staff 26M Feb 7 10:18 voices.bin
Map between speaker ID and speaker name
This model contains 53 speakers and we use integer IDs 0-52
to represent
each speaker.
Please visit https://github.com/k2-fsa/sherpa-onnx/pull/1795 to listen to audio samples from different speakers.
The map is given below:
ID to Speaker
0->af_alloy, 1->af_aoede, 2->af_bella, 3->af_heart, 4->af_jessica, 5->af_kore, 6->af_nicole, 7->af_nova, 8->af_river, 9->af_sarah, 10->af_sky, 11->am_adam, 12->am_echo, 13->am_eric, 14->am_fenrir, 15->am_liam, 16->am_michael, 17->am_onyx, 18->am_puck, 19->am_santa, 20->bf_alice, 21->bf_emma, 22->bf_isabella, 23->bf_lily, 24->bm_daniel, 25->bm_fable, 26->bm_george, 27->bm_lewis, 28->ef_dora, 29->em_alex, 30->ff_siwis, 31->hf_alpha, 32->hf_beta, 33->hm_omega, 34->hm_psi, 35->if_sara, 36->im_nicola, 37->jf_alpha, 38->jf_gongitsune, 39->jf_nezumi, 40->jf_tebukuro, 41->jm_kumo, 42->pf_dora, 43->pm_alex, 44->pm_santa, 45->zf_xiaobei, 46->zf_xiaoni, 47->zf_xiaoxiao, 48->zf_xiaoyi,49->zm_yunjian, 50->zm_yunxi, 51->zm_yunxia, 52->zm_yunyang,Speaker to ID
af_alloy->0, af_aoede->1, af_bella->2, af_heart->3, af_jessica->4, af_kore->5, af_nicole->6, af_nova->7, af_river->8, af_sarah->9, af_sky->10, am_adam->11, am_echo->12, am_eric->13, am_fenrir->14, am_liam->15, am_michael->16, am_onyx->17, am_puck->18, am_santa->19, bf_alice->20, bf_emma->21, bf_isabella->22, bf_lily->23, bm_daniel->24, bm_fable->25, bm_george->26, bm_lewis->27, ef_dora->28, em_alex->29, ff_siwis->30, hf_alpha->31, hf_beta->32, hm_omega->33, hm_psi->34, if_sara->35, im_nicola->36, jf_alpha->37, jf_gongitsune->38, jf_nezumi->39, jf_tebukuro->40, jm_kumo->41, pf_dora->42, pm_alex->43, pm_santa->44, zf_xiaobei->45, zf_xiaoni->46, zf_xiaoxiao->47, zf_xiaoyi->48, zm_yunjian->49, zm_yunxi->50, zm_yunxia->51, zm_yunyang->52
Generate speech with executables compiled from C++
Click ▶ to see it.
cd /path/to/sherpa-onnx
for sid in $(seq 0 19); do
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=$sid \
--output-filename="./kokoro-1.0-sid-$sid-en-us.wav" \
"Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone."
done
for sid in $(seq 20 27); do
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=$sid \
--output-filename="./kokoro-1.0-sid-$sid-en-gb.wav" \
"Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone."
done
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=23 \
--output-filename="./kokoro-1.0-sid-23-en-gb.wav" \
"Liliana, the most beautiful and lovely assistant of our team"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=24 \
--output-filename="./kokoro-1.0-sid-24-en-gb.wav" \
"Liliana, the most beautiful and lovely assistant of our team"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=45 \
--output-filename="./kokoro-1.0-sid-45-zh.wav" \
"小米的核心价值观是什么?答案是真诚热爱!"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=45 \
--output-filename="./kokoro-1.0-sid-45-zh-1.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=46 \
--output-filename="./kokoro-1.0-sid-46-zh.wav" \
"小米的使命是,始终坚持做感动人心、价格厚道的好产品,让全球每个人都能享受科技带来的美好生活。"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=46 \
--output-filename="./kokoro-1.0-sid-46-zh-1.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=47 \
--output-filename="./kokoro-1.0-sid-47-zh.wav" \
"35年前,他于长沙出生, 在长白山长大。9年前他当上了银行的领导,主管行政。"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=47 \
--output-filename="./kokoro-1.0-sid-47-zh-1.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/phone-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=48 \
--output-filename="./kokoro-1.0-sid-48-zh-1.wav" \
"有困难,请拨打110 或者18601200909"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=48 \
--output-filename="./kokoro-1.0-sid-48-zh-2.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/date-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=48 \
--output-filename="./kokoro-1.0-sid-48-zh.wav" \
"现在是2025年12点55分, 星期5。明天是周6,不用上班, 太棒啦!"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/date-zh.fst,./kokoro-multi-lang-v1_0/phone-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=49 \
--output-filename="./kokoro-1.0-sid-49-zh.wav" \
"根据第7次全国人口普查结果表明,我国总人口有1443497378人。普查登记的大陆31个省、自治区、直辖市和现役军人的人口共1411778724人。电话号码是110。手机号是13812345678"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=49 \
--output-filename="./kokoro-1.0-sid-49-zh-1.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=50 \
--output-filename="./kokoro-1.0-sid-50-zh.wav" \
"林美丽最美丽、最漂亮、最可爱!"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=50 \
--output-filename="./kokoro-1.0-sid-50-zh-1.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=51 \
--output-filename="./kokoro-1.0-sid-51-zh.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--num-threads=2 \
--sid=52 \
--output-filename="./kokoro-1.0-sid-52-zh.wav" \
"当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔."
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/date-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=52 \
--output-filename="./kokoro-1.0-sid-52-zh-en.wav" \
"Are you ok 是雷军2015年4月小米在印度举行新品发布会时说的。他还说过, I am very happy to be in China. 雷军事后在微博上表示 “万万没想到,视频火速传到国内,全国人民都笑了”. 现在国际米粉越来越多,我的确应该把英文学好,不让大家失望!加油!"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/date-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=1 \
--output-filename="./kokoro-1.0-sid-1-zh-en.wav" \
"Are you ok 是雷军2015年4月小米在印度举行新品发布会时说的。他还说过, I am very happy to be in China. 雷军事后在微博上表示 “万万没想到,视频火速传到国内,全国人民都笑了”. 现在国际米粉越来越多,我的确应该把英文学好,不让大家失望!加油!"
build/bin/sherpa-onnx-offline-tts \
--debug=0 \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/date-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--num-threads=2 \
--sid=18 \
--output-filename="./kokoro-1.0-sid-18-zh-en.wav" \
"Are you ok 是雷军2015年4月小米在印度举行新品发布会时说的。他还说过, I am very happy to be in China. 雷军事后在微博上表示 “万万没想到,视频火速传到国内,全国人民都笑了”. 现在国际米粉越来越多,我的确应该把英文学好,不让大家失望!加油!"
After running, it will generate many .wav
files in the
current directory.
Audio samples
An example is given below:
Click ▶ to see it.
soxi ./kokoro-1.0-sid-1-zh-en.wav
Input File : './kokoro-1.0-sid-1-zh-en.wav'
Channels : 1
Sample Rate : 24000
Precision : 16-bit
Duration : 00:00:26.00 = 624008 samples ~ 1950.02 CDDA sectors
File Size : 1.25M
Bit Rate : 384k
Sample Encoding: 16-bit Signed Integer PCM
Hint
Sample rate of this model is fixed to 24000 Hz
.
Wave filename | Content | Text |
---|---|---|
kokoro-1.0-sid-0-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-1-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-2-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-3-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-4-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-5-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-6-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-7-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-8-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-9-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-10-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-11-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-12-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-13-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-14-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-15-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-16-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-17-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-18-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-19-en-us.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-20-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-21-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-22-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-23-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-24-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-25-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-23-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-24-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-25-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-26-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-27-en-gb.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." | |
kokoro-1.0-sid-45-zh.wav | "小米的核心价值观是什么?答案是真诚热爱!" | |
kokoro-1.0-sid-45-zh-1.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-46-zh.wav | "小米的使命是,始终坚持做感动人心、价格厚道的好产品,让全球每个人都能享受科技带来的美好生活。" | |
kokoro-1.0-sid-46-zh-1.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-47-zh.wav | "35年前,他于长沙出生, 在长白山长大。9年前他当上了银行的领导,主管行政。" | |
kokoro-1.0-sid-47-zh-1.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-48-zh-1.wav | "有困难,请拨打110 或者18601200909" | |
kokoro-1.0-sid-48-zh-2.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-48-zh.wav | "现在是2025年12点55分, 星期5。明天是周6,不用上班, 太棒啦!" | |
kokoro-1.0-sid-49-zh.wav | "根据第7次全国人口普查结果表明,我国总人口有1443497378人。普查登记的大陆31个省、自治区、直辖市和现役军人的人口共1411778724人。电话号码是110。手机号是13812345678" | |
kokoro-1.0-sid-49-zh-1.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-50-zh.wav | "林美丽最美丽、最漂亮、最可爱!" | |
kokoro-1.0-sid-50-zh-1.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-51-zh.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-52-zh.wav | "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." | |
kokoro-1.0-sid-52-zh-en.wav | "Are you ok 是雷军2015年4月小米在印度举行新品发布会时说的。他还说过, I am very happy to be in China. 雷军事后在微博上表示 “万万没想到,视频火速传到国内,全国人民都笑了”. 现在国际米粉越来越多,我的确应该把英文学好,不让大家失望!加油!" | |
kokoro-1.0-sid-1-zh-en.wav | "Are you ok 是雷军2015年4月小米在印度举行新品发布会时说的。他还说过, I am very happy to be in China. 雷军事后在微博上表示 “万万没想到,视频火速传到国内,全国人民都笑了”. 现在国际米粉越来越多,我的确应该把英文学好,不让大家失望!加油!" | |
kokoro-1.0-sid-18-zh-en.wav | "Are you ok 是雷军2015年4月小米在印度举行新品发布会时说的。他还说过, I am very happy to be in China. 雷军事后在微博上表示 “万万没想到,视频火速传到国内,全国人民都笑了”. 现在国际米粉越来越多,我的确应该把英文学好,不让大家失望!加油!" |
Generate speech with Python script
Please replace build/bin/sherpa-onnx-offline-tts
in the above examples
with python3 ./python-api-examples/offline-tts.py
.
or with python3 ./python-api-examples/offline-tts-play.py
.
Hint
Download offline-tts.py
Download offline-tts-play.py
RTF on Raspberry Pi 4 Model B Rev 1.5
We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5:
for t in 1 2 3 4; do
build/bin/sherpa-onnx-offline-tts \
--num-threads=$t \
--kokoro-model=./kokoro-multi-lang-v1_0/model.onnx \
--kokoro-voices=./kokoro-multi-lang-v1_0/voices.bin \
--kokoro-tokens=./kokoro-multi-lang-v1_0/tokens.txt \
--kokoro-data-dir=./kokoro-multi-lang-v1_0/espeak-ng-data \
--kokoro-dict-dir=./kokoro-multi-lang-v1_0/dict \
--kokoro-lexicon=./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt \
--tts-rule-fsts=./kokoro-multi-lang-v1_0/date-zh.fst,./kokoro-multi-lang-v1_0/number-zh.fst \
--sid=1 \
--output-filename="./kokoro-1.0-sid-1-en.wav" \
"你好吗?Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone."
done
The results are given below:
num_threads
1
2
3
4
RTF
7.635
4.470
3.430
3.191
kokoro-en-v0_19 (English, 11 speakers)
This model contains 11 speakers. The ONNX model is from https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files
The script for adding meta data to the ONNX model can be found at https://github.com/k2-fsa/sherpa-onnx/tree/master/scripts/kokoro
In the following, we describe how to download it and use it with sherpa-onnx.
Download the model
Please use the following commands to download it.
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kokoro-en-v0_19.tar.bz2
tar xf kokoro-en-v0_19.tar.bz2
rm kokoro-en-v0_19.tar.bz2
Please check that the file sizes of the pre-trained models are correct. See
the file sizes of *.onnx
files below.
ls -lh kokoro-en-v0_19/
total 686208
-rw-r--r-- 1 fangjun staff 11K Jan 15 16:23 LICENSE
-rw-r--r-- 1 fangjun staff 235B Jan 15 16:25 README.md
drwxr-xr-x 122 fangjun staff 3.8K Nov 28 2023 espeak-ng-data
-rw-r--r-- 1 fangjun staff 330M Jan 15 16:25 model.onnx
-rw-r--r-- 1 fangjun staff 1.1K Jan 15 16:25 tokens.txt
-rw-r--r-- 1 fangjun staff 5.5M Jan 15 16:25 voices.bin
Map between speaker ID and speaker name
The model contains 11 speakers and we use integer IDs 0-10
to represent.
each speaker.
The map is given below:
Speaker ID |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Speaker Name |
af |
af_bella |
af_nicole |
af_sarah |
af_sky |
am_adam |
am_michael |
bf_emma |
bf_isabella |
bm_george |
bm_lewis |
ID | name | Test wave |
---|---|---|
0 | af | |
1 | af_bella | |
2 | af_nicole | |
3 | af_sarah | |
4 | af_sky | |
5 | am_adam | |
6 | am_michael | |
7 | bf_emma | |
8 | bf_isabella | |
9 | bm_george | |
10 | bm_lewis |
Generate speech with executables compiled from C++
cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-offline-tts \
--kokoro-model=./kokoro-en-v0_19/model.onnx \
--kokoro-voices=./kokoro-en-v0_19/voices.bin \
--kokoro-tokens=./kokoro-en-v0_19/tokens.txt \
--kokoro-data-dir=./kokoro-en-v0_19/espeak-ng-data \
--num-threads=2 \
--sid=10 \
--output-filename="./10-bm_lewis.wav" \
"Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar."
After running, it will generate a file 10-bm_lewis
in the
current directory.
soxi ./10-bm_lewis.wav
Input File : './10-bm_lewis.wav'
Channels : 1
Sample Rate : 24000
Precision : 16-bit
Duration : 00:00:15.80 = 379200 samples ~ 1185 CDDA sectors
File Size : 758k
Bit Rate : 384k
Sample Encoding: 16-bit Signed Integer PCM
Hint
Sample rate of this model is fixed to 24000 Hz
.
Wave filename | Content | Text |
---|---|---|
10-bm_lewis.wav | "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar." |
Generate speech with Python script
cd /path/to/sherpa-onnx
python3 ./python-api-examples/offline-tts.py \
--kokoro-model=./kokoro-en-v0_19/model.onnx \
--kokoro-voices=./kokoro-en-v0_19/voices.bin \
--kokoro-tokens=./kokoro-en-v0_19/tokens.txt \
--kokoro-data-dir=./kokoro-en-v0_19/espeak-ng-data \
--num-threads=2 \
--sid=2 \
--output-filename=./2-af_nicole.wav \
"Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone."
soxi ./2-af_nicole.wav
Input File : './2-af_nicole.wav'
Channels : 1
Sample Rate : 24000
Precision : 16-bit
Duration : 00:00:11.45 = 274800 samples ~ 858.75 CDDA sectors
File Size : 550k
Bit Rate : 384k
Sample Encoding: 16-bit Signed Integer PCM
Wave filename | Content | Text |
---|---|---|
2-af_nicole.wav | "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." |
RTF on Raspberry Pi 4 Model B Rev 1.5
We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5:
for t in 1 2 3 4; do
build/bin/sherpa-onnx-offline-tts \
--num-threads=$t \
--kokoro-model=./kokoro-en-v0_19/model.onnx \
--kokoro-voices=./kokoro-en-v0_19/voices.bin \
--kokoro-tokens=./kokoro-en-v0_19/tokens.txt \
--kokoro-data-dir=./kokoro-en-v0_19/espeak-ng-data \
--sid=2 \
--output-filename=./2-af_nicole.wav \
"Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone."
done
The results are given below:
num_threads
1
2
3
4
RTF
6.629
3.870
2.999
2.774