On-device speaker identification (本地说话人识别)

This page describes how to build SherpaOnnxSpeakerIdentification for on-device speaker identification that runs on HarmonyOS.

Open the project with DevEco Studio

You need to first download the code:

# Assume we place it inside /Users/fangjun/open-source
# You can place it anywhere you like.

cd /Users/fangjun/open-source/

git clone https://github.com/k2-fsa/sherpa-onnx

Then start DevEco Studio and follow the screenshots below:

Screenshot of starting DevEco — Fig. 43 Step 1: Click Open

Screenshot of selecting SherpaOnnxSpeakerIdentification to open — Fig. 44 Step 2: Select SherpaOnnxSpeakerIdentification inside the harmony-os folder and click Open

Screenshot of check version — Fig. 45 Step 3: Check that it is using the latest version. You can visit sherpa_onnx to check available versions.

Select a model

The code supports many models for extracting speaker embeddings and you have to select one.

You can find all supported models at

https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models

We use the following model

https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_200k_sv_zh-cn_16k-common.onnx

as an example in this document.

Use 3dspeaker_speech_eres2net_base_200k_sv_zh-cn_16k-common.onnx

First, we download it to the rawfile directory.

Caution: You MUST place the file inside the rawfile directory. Otherwise, you would be SAD later.

cd /Users/fangjun/open-source/sherpa-onnx/harmony-os/SherpaOnnxSpeakerIdentification/entry/src/main/resources/rawfile

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_200k_sv_zh-cn_16k-common.onnx

Please check that your directory looks exactly like the following:

(py38) fangjuns-MacBook-Pro:rawfile fangjun$ pwd
/Users/fangjun/open-source/sherpa-onnx/harmony-os/SherpaOnnxSpeakerIdentification/entry/src/main/resources/rawfile

(py38) fangjuns-MacBook-Pro:rawfile fangjun$ ls -lh
total 77888
-rw-r--r--  1 fangjun  staff    38M Oct 14 11:41 3dspeaker_speech_eres2net_base_200k_sv_zh-cn_16k-common.onnx