Run executables on your phone with adb (using model.bin)

In Build sherpa-onnx for Qualcomm NPU, we have described how to generate executable files. This section describes how to run them with QNN models (model.bin) on your phone with adb.

Hint

model.bin is OS-independent, but QNN-SDK-dependent and SoC-dependent.

OS-independent: A model.bin can run on both Android/arm64 and Linux/arm64.

QNN-SDK-dependent: Once built, model.bin depends on the version of the QNN SDK used during its creation.

SoC-dependent: A model.bin built for SM8850 cannot be used on SA8259, and vice versa.

Trade-off: The first-run initialization is extremely fast because the context is pre-generated.

Alternative: If you need SoC-independence or QNN-SDK-independence, use libmodel.so. For guidance, see Run executables on your phone with adb (using libmodel.so).

Table 1 Comparison of `libmodel.so` and `model.bin`
Feature	libmodel.so	model.bin
OS Dependency	OS-dependent: cannot run across different OS/arch (e.g., Android/arm64 vs Linux/arm64)	OS-independent: can run on multiple OS/arch (e.g., Android/arm64 and Linux/arm64)
SoC Dependency	SoC-independent: can run on multiple Qualcomm chips (e.g., SM8850, SA8259, QCS9100)	SoC-dependent: built for a specific chip; cannot run on a different SoC
QNN-SDK Dependency	QNN-SDK-independent: works with any QNN SDK version	QNN-SDK-dependent: depends on the QNN SDK version used to build it
First-Run Initialization	Slow: context must be generated at runtime	Fast: context is pre-generated
Recommended Use	When SoC-independence or SDK-independence is needed	When fastest startup is required

Note: Choose libmodel.so if you need flexibility across SoCs or QNN SDK versions. Use model.bin if you want the fastest possible first-run initialization on a specific SoC.

Download a QNN model

You can find available QNN models at

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models-qnn-binary

Hint

For libmodel.so, please see

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models-qnn

Since QNN does not support dynamic input shapes, we limit the maximum duration the model can handle. For example, if the limit is 10 seconds, any input shorter than 10 seconds will be padded to 10 seconds, and inputs longer than 10 seconds will be truncated to that length.

The model name indicates the maximum duration the model can handle.

Caution

I am using a Xiaomi 17 Pro for testing, so I selected a model with SM8850 in its name.
Make sure to select a model that matches your own device.
Suppose you are testing on a Samsung Galaxy S23 Ultra, which uses the SM8550 SoC; In this case, you should select a model with SM8550 in its name instead of SM8850.

We use sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2 as an example below:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2

You should see the following files:

ls -lh sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/
total 526984
-rw-r--r--@ 1 fangjun  staff    23B  9 Dec 16:40 info.txt
-rw-r--r--@ 1 fangjun  staff    71B  9 Dec 16:40 LICENSE
-rw-r--r--@ 1 fangjun  staff   242M  9 Dec 16:40 model.bin
-rw-r--r--@ 1 fangjun  staff   104B  9 Dec 16:40 README.md
drwxr-xr-x@ 7 fangjun  staff   224B  9 Dec 16:40 test_wavs
-rw-r--r--@ 1 fangjun  staff   308K  9 Dec 16:40 tokens.txt

Copy files to your phone

We assume you put files in the directory /data/local/tmp/binary on your phone.

# Run on your computer

adb shell mkdir /data/local/tmp/binary

Copy model files

# Run on your computer

adb push ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8 /data/local/tmp/binary/

Copy sherpa-onnx executable files

# Run on your computer

adb push ./build-android-arm64-v8a/install/bin/sherpa-onnx-offline /data/local/tmp/binary/

Copy sherpa-onnx library files

# Run on your computer

adb push ./build-android-arm64-v8a/install/lib/libonnxruntime.so /data/local/tmp/binary/

Hint

You don’t need to copy libsherpa-onnx-jni.so in this case.

Copy QNN library files

Before you continue, we assume you have followed Download QNN SDK to download QNN SDK and set up the environment variable QNN_SDK_ROOT.

You should run:

echo $QNN_SDK_ROOT

to check that it points to the QNN SDK directory.

Warning

We use QNN SDK v2.40.0.251030 to generate model.bin.

If you change the QNN SDK version, please re-generate the model.bin by yourself.

# Run on your computer

adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtp.so /data/local/tmp/binary/
adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/binary/
adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnSystem.so /data/local/tmp/binary/

# Since my Xiami 17 Pro is SM8850, which corresponds to Htp Arch 81, so I choose
# libQnnHtpV81Stub.so and libQnnHtpV81Skel.so
#
# Please udpate it accordingly to match your device
#
adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV81Stub.so /data/local/tmp/binary/

adb push $QNN_SDK_ROOT/lib/hexagon-v81/unsigned/libQnnHtpV81Skel.so /data/local/tmp/binary/

Run it !

adb shell

The following commands are run on your phone.

Check files

First, check that you have followed above commands to copy files:

screenshot of expected files on your phone

screenshot of expected models files on your phone

Set environment variable ADSP_LIBRARY_PATH

export ADSP_LIBRARY_PATH="$PWD;$ADSP_LIBRARY_PATH"

where $PWD is /data/local/tmp/binary in this case.

Caution

Please use ;, not :.

It is an error to use export ADSP_LIBRARY_PATH="$PWD:$ADSP_LIBRARY_PATH"

screenshot of setting ``ADSP_LIBRARY_PATH``

Run sherpa-onnx-offline

Caution

You would be sad if you did not set the environment variable ADSP_LIBRARY_PATH.

./sherpa-onnx-offline \
  --provider=qnn \
  --tokens=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt \
  --sense-voice.qnn-backend-lib=./libQnnHtp.so \
  --sense-voice.qnn-system-lib=./libQnnSystem.so \
  --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin \
  ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav

or write it in a single line:

./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav

You can also find the log below:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin", system_lib="./libQnnSystem.so"), language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 1.214 s
Started
Done!

./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开饭时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74], "durations": [], "tokens":["开", "饭", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.290 s
Real time factor (RTF): 0.290 / 5.592 = 0.052
     0.0ms [WARN   ] QnnDsp <W> Initializing HtpProvider
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList

Please ignore the num_threads information in the log. It is not used by qnn.

Hint

The model actually processed 10 seconds of audio, so the RTF is even smaller.

Congratulations

Congratulations! You have successfully launched sherpa-onnx on your phone, leveraging Qualcomm NPU via QNN with the HTP backend.