Run executables on your phone with adb (using model.bin)

In Build sherpa-onnx for Qualcomm NPU, we have described how to generate executable files. This section describes how to run them with QNN models (model.bin) on your phone with adb.

Hint

model.bin is OS-independent, but QNN-SDK-dependent and SoC-dependent.

  • OS-independent: A model.bin can run on both Android/arm64 and Linux/arm64.

  • QNN-SDK-dependent: Once built, model.bin depends on the version of the QNN SDK used during its creation.

  • SoC-dependent: A model.bin built for SM8850 cannot be used on SA8259, and vice versa.

  • Trade-off: The first-run initialization is extremely fast because the context is pre-generated.

  • Alternative: If you need SoC-independence or QNN-SDK-independence, use libmodel.so. For guidance, see Run executables on your phone with adb (using libmodel.so).

Table 1 Comparison of libmodel.so and model.bin

Feature

libmodel.so

model.bin

OS Dependency

OS-dependent: cannot run across
different OS/arch
(e.g., Android/arm64
vs Linux/arm64)
OS-independent: can run on
multiple OS/arch
(e.g., Android/arm64
and Linux/arm64)

SoC Dependency

SoC-independent: can run
on multiple Qualcomm chips
(e.g., SM8850, SA8259, QCS9100)
SoC-dependent: built for
a specific chip;
cannot run on a different SoC

QNN-SDK Dependency

QNN-SDK-independent: works
with any QNN SDK version
QNN-SDK-dependent: depends
on the QNN SDK version
used to build it

First-Run Initialization

Slow: context must be
generated at runtime
Fast: context is
pre-generated

Recommended Use

When SoC-independence or
SDK-independence is needed

When fastest startup is required

Note: Choose libmodel.so if you need flexibility across SoCs or QNN SDK versions. Use model.bin if you want the fastest possible first-run initialization on a specific SoC.

Download a QNN model

You can find available QNN models at

Since QNN does not support dynamic input shapes, we limit the maximum duration the model can handle. For example, if the limit is 10 seconds, any input shorter than 10 seconds will be padded to 10 seconds, and inputs longer than 10 seconds will be truncated to that length.

The model name indicates the maximum duration the model can handle.

Caution

  • I am using a Xiaomi 17 Pro for testing, so I selected a model with SM8850 in its name.

  • Make sure to select a model that matches your own device.

  • Suppose you are testing on a Samsung Galaxy S23 Ultra, which uses the SM8550 SoC; In this case, you should select a model with SM8550 in its name instead of SM8850.

We use sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2 as an example below:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2

You should see the following files:

ls -lh sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/
total 526984
-rw-r--r--@ 1 fangjun  staff    23B  9 Dec 16:40 info.txt
-rw-r--r--@ 1 fangjun  staff    71B  9 Dec 16:40 LICENSE
-rw-r--r--@ 1 fangjun  staff   242M  9 Dec 16:40 model.bin
-rw-r--r--@ 1 fangjun  staff   104B  9 Dec 16:40 README.md
drwxr-xr-x@ 7 fangjun  staff   224B  9 Dec 16:40 test_wavs
-rw-r--r--@ 1 fangjun  staff   308K  9 Dec 16:40 tokens.txt

Copy files to your phone

We assume you put files in the directory /data/local/tmp/binary on your phone.

# Run on your computer

adb shell mkdir /data/local/tmp/binary

Copy model files

# Run on your computer

adb push ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8 /data/local/tmp/binary/

Copy sherpa-onnx executable files

# Run on your computer

adb push ./build-android-arm64-v8a/install/bin/sherpa-onnx-offline /data/local/tmp/binary/

Copy sherpa-onnx library files

# Run on your computer

adb push ./build-android-arm64-v8a/install/lib/libonnxruntime.so /data/local/tmp/binary/

Hint

You don’t need to copy libsherpa-onnx-jni.so in this case.

Copy QNN library files

Before you continue, we assume you have followed Download QNN SDK to download QNN SDK and set up the environment variable QNN_SDK_ROOT.

You should run:

echo $QNN_SDK_ROOT

to check that it points to the QNN SDK directory.

Warning

We use QNN SDK v2.40.0.251030 to generate model.bin.

If you change the QNN SDK version, please re-generate the model.bin by yourself.

# Run on your computer

adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtp.so /data/local/tmp/binary/
adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/binary/
adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnSystem.so /data/local/tmp/binary/

# Since my Xiami 17 Pro is SM8850, which corresponds to Htp Arch 81, so I choose
# libQnnHtpV81Stub.so and libQnnHtpV81Skel.so
#
# Please udpate it accordingly to match your device
#
adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV81Stub.so /data/local/tmp/binary/

adb push $QNN_SDK_ROOT/lib/hexagon-v81/unsigned/libQnnHtpV81Skel.so /data/local/tmp/binary/

Run it !

adb shell

The following commands are run on your phone.

Check files

First, check that you have followed above commands to copy files:

screenshot of expected files on your phone screenshot of expected models files on your phone

Set environment variable ADSP_LIBRARY_PATH

export ADSP_LIBRARY_PATH="$PWD;$ADSP_LIBRARY_PATH"

where $PWD is /data/local/tmp/binary in this case.

Caution

Please use ;, not :.

It is an error to use export ADSP_LIBRARY_PATH="$PWD:$ADSP_LIBRARY_PATH"

It is an error to use export ADSP_LIBRARY_PATH="$PWD:$ADSP_LIBRARY_PATH"

It is an error to use export ADSP_LIBRARY_PATH="$PWD:$ADSP_LIBRARY_PATH"

screenshot of setting ``ADSP_LIBRARY_PATH``

Run sherpa-onnx-offline

Caution

You would be sad if you did not set the environment variable ADSP_LIBRARY_PATH.

./sherpa-onnx-offline \
  --provider=qnn \
  --tokens=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt \
  --sense-voice.qnn-backend-lib=./libQnnHtp.so \
  --sense-voice.qnn-system-lib=./libQnnSystem.so \
  --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin \
  ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav

or write it in a single line:

./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav

You can also find the log below:

Click ▶ to see the log .

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin", system_lib="./libQnnSystem.so"), language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 1.214 s
Started
Done!

./sherpa-onnx-qnn-SM8850-binary-10-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开饭时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.30, 3.90, 4.20, 4.56, 4.74], "durations": [], "tokens":["开", "饭", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.290 s
Real time factor (RTF): 0.290 / 5.592 = 0.052
     0.0ms [WARN   ] QnnDsp <W> Initializing HtpProvider
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList

Please ignore the num_threads information in the log. It is not used by qnn.

Hint

The model actually has processed 10 seconds of audio, so the RTF is even smaller.

Congratulations

Congratulations! You have successfully launched sherpa-onnx on your phone, leveraging Qualcomm NPU via QNN with the HTP backend.