Export LSTM transducer models to ncnn
We use the pre-trained model from the following repository as an example:
https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
We will show you step by step how to export it to ncnn and run it with sherpa-ncnn.
Hint
We use Ubuntu 18.04
, torch 1.13
, and Python 3.8
for testing.
Caution
torch > 2.0
may not work. If you get errors while building pnnx, please switch
to torch < 2.0
.
1. Download the pre-trained model
Hint
You have to install git-lfs before you continue.
cd egs/librispeech/ASR
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
git lfs pull --include "exp/pretrained-iter-468000-avg-16.pt"
git lfs pull --include "data/lang_bpe_500/bpe.model"
cd ..
Note
We downloaded exp/pretrained-xxx.pt
, not exp/cpu-jit_xxx.pt
.
In the above code, we downloaded the pre-trained model into the directory
egs/librispeech/ASR/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
.
2. Install ncnn and pnnx
Please refer to 2. Install ncnn and pnnx .
3. Export the model via torch.jit.trace()
First, let us rename our pre-trained model:
cd egs/librispeech/ASR
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp
ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt
cd ../..
Next, we use the following code to export our model:
dir=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
./lstm_transducer_stateless2/export-for-ncnn.py \
--exp-dir $dir/exp \
--tokens $dir/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--num-encoder-layers 12 \
--encoder-dim 512 \
--rnn-hidden-size 1024
Hint
We have renamed our model to epoch-99.pt
so that we can use --epoch 99
.
There is only one pre-trained model, so we use --avg 1 --use-averaged-model 0
.
If you have trained a model by yourself and if you have all checkpoints
available, please first use decode.py
to tune --epoch --avg
and select the best combination with with --use-averaged-model 1
.
Note
You will see the following log output:
2023-02-17 11:22:42,862 INFO [export-for-ncnn.py:222] device: cpu
2023-02-17 11:22:42,865 INFO [export-for-ncnn.py:231] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'dim_feedforward': 2048, 'decoder_dim': 512, 'joiner_dim': 512, 'is_pnnx': False, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '62e404dd3f3a811d73e424199b3408e309c06e1a', 'k2-git-date': 'Mon Jan 30 10:26:16 2023', 'lhotse-version': '1.12.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'master', 'icefall-git-sha1': '6d7a559-dirty', 'icefall-git-date': 'Thu Feb 16 19:47:54 2023', 'icefall-path': '/star-fj/fangjun/open-source/icefall-2', 'k2-path': '/star-fj/fangjun/open-source/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-fj/fangjun/open-source/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-3-1220120619-7695ff496b-s9n4w', 'IP address': '10.177.6.147'}, 'epoch': 99, 'iter': 0, 'avg': 1, 'exp_dir': PosixPath('icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp'), 'bpe_model': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/bpe.model', 'context_size': 2, 'use_averaged_model': False, 'num_encoder_layers': 12, 'encoder_dim': 512, 'rnn_hidden_size': 1024, 'aux_layer_period': 0, 'blank_id': 0, 'vocab_size': 500}
2023-02-17 11:22:42,865 INFO [export-for-ncnn.py:235] About to create model
2023-02-17 11:22:43,239 INFO [train.py:472] Disable giga
2023-02-17 11:22:43,249 INFO [checkpoint.py:112] Loading checkpoint from icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/epoch-99.pt
2023-02-17 11:22:44,595 INFO [export-for-ncnn.py:324] encoder parameters: 83137520
2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:325] decoder parameters: 257024
2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:326] joiner parameters: 781812
2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:327] total parameters: 84176356
2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:329] Using torch.jit.trace()
2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:331] Exporting encoder
2023-02-17 11:22:48,182 INFO [export-for-ncnn.py:158] Saved to icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.pt
2023-02-17 11:22:48,183 INFO [export-for-ncnn.py:335] Exporting decoder
/star-fj/fangjun/open-source/icefall-2/egs/librispeech/ASR/lstm_transducer_stateless2/decoder.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
need_pad = bool(need_pad)
2023-02-17 11:22:48,259 INFO [export-for-ncnn.py:180] Saved to icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.pt
2023-02-17 11:22:48,259 INFO [export-for-ncnn.py:339] Exporting joiner
2023-02-17 11:22:48,304 INFO [export-for-ncnn.py:207] Saved to icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.pt
The log shows the model has 84176356
parameters, i.e., ~84 M
.
ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt
-rw-r--r-- 1 kuangfangjun root 324M Feb 17 10:34 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt
You can see that the file size of the pre-trained model is 324 MB
, which
is roughly equal to 84176356*4/1024/1024 = 321.107 MB
.
After running lstm_transducer_stateless2/export-for-ncnn.py
,
we will get the following files:
ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*pnnx.pt
-rw-r--r-- 1 kuangfangjun root 1010K Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.pt
-rw-r--r-- 1 kuangfangjun root 318M Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.pt
-rw-r--r-- 1 kuangfangjun root 3.0M Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.pt
4. Export torchscript model via pnnx
Hint
Make sure you have set up the PATH
environment variable
in 2. Install ncnn and pnnx. Otherwise,
it will throw an error saying that pnnx
could not be found.
Now, it’s time to export our models to ncnn via pnnx
.
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
pnnx ./encoder_jit_trace-pnnx.pt
pnnx ./decoder_jit_trace-pnnx.pt
pnnx ./joiner_jit_trace-pnnx.pt
It will generate the following files:
ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*ncnn*{bin,param}
-rw-r--r-- 1 kuangfangjun root 503K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 437 Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param
-rw-r--r-- 1 kuangfangjun root 159M Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 21K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param
-rw-r--r-- 1 kuangfangjun root 1.5M Feb 17 11:33 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 488 Feb 17 11:33 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param
There are two types of files:
param
: It is a text file containing the model architectures. You can use a text editor to view its content.bin
: It is a binary file containing the model parameters.
We compare the file sizes of the models below before and after converting via pnnx
:
File name |
File size |
---|---|
encoder_jit_trace-pnnx.pt |
318 MB |
decoder_jit_trace-pnnx.pt |
1010 KB |
joiner_jit_trace-pnnx.pt |
3.0 MB |
encoder_jit_trace-pnnx.ncnn.bin |
159 MB |
decoder_jit_trace-pnnx.ncnn.bin |
503 KB |
joiner_jit_trace-pnnx.ncnn.bin |
1.5 MB |
You can see that the file sizes of the models after conversion are about one half of the models before conversion:
encoder: 318 MB vs 159 MB
decoder: 1010 KB vs 503 KB
joiner: 3.0 MB vs 1.5 MB
The reason is that by default pnnx
converts float32
parameters
to float16
. A float32
parameter occupies 4 bytes, while it is 2 bytes
for float16
. Thus, it is twice smaller
after conversion.
Hint
If you use pnnx ./encoder_jit_trace-pnnx.pt fp16=0
, then pnnx
won’t convert float32
to float16
.
5. Test the exported models in icefall
Note
We assume you have set up the environment variable PYTHONPATH
when
building ncnn.
Now we have successfully converted our pre-trained model to ncnn format. The generated 6 files are what we need. You can use the following code to test the converted models:
python3 ./lstm_transducer_stateless2/streaming-ncnn-decode.py \
--tokens ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/tokens.txt \
--encoder-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin \
./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav
Hint
ncnn supports only batch size == 1
, so streaming-ncnn-decode.py
accepts
only 1 wave file as input.
The output is given below:
2023-02-17 11:37:30,861 INFO [streaming-ncnn-decode.py:255] {'tokens': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/tokens.txt', 'encoder_param_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param', 'encoder_bin_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin', 'decoder_param_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param', 'decoder_bin_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin', 'joiner_param_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param', 'joiner_bin_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin', 'sound_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav'}
2023-02-17 11:37:31,425 INFO [streaming-ncnn-decode.py:263] Constructing Fbank computer
2023-02-17 11:37:31,427 INFO [streaming-ncnn-decode.py:266] Reading sound files: ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav
2023-02-17 11:37:31,431 INFO [streaming-ncnn-decode.py:271] torch.Size([106000])
2023-02-17 11:37:34,115 INFO [streaming-ncnn-decode.py:342] ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav
2023-02-17 11:37:34,115 INFO [streaming-ncnn-decode.py:343] AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
Congratulations! You have successfully exported a model from PyTorch to ncnn!
6. Modify the exported encoder for sherpa-ncnn
In order to use the exported models in sherpa-ncnn, we have to modify
encoder_jit_trace-pnnx.ncnn.param
.
Let us have a look at the first few lines of encoder_jit_trace-pnnx.ncnn.param
:
7767517
267 379
Input in0 0 1 in0
Explanation of the above three lines:
7767517
, it is a magic number and should not be changed.
267 379
, the first number267
specifies the number of layers in this file, while379
specifies the number of intermediate outputs of this file
Input in0 0 1 in0
,Input
is the layer type of this layer;in0
is the layer name of this layer;0
means this layer has no input;1
means this layer has one output;in0
is the output name of this layer.
We need to add 1 extra line and also increment the number of layers. The result looks like below:
7767517
268 379
SherpaMetaData sherpa_meta_data1 0 0 0=3 1=12 2=512 3=1024
Input in0 0 1 in0
Explanation
7767517
, it is still the same
268 379
, we have added an extra layer, so we need to update267
to268
. We don’t need to change379
since the newly added layer has no inputs or outputs.
SherpaMetaData sherpa_meta_data1 0 0 0=3 1=12 2=512 3=1024
This line is newly added. Its explanation is given below:
SherpaMetaData
is the type of this layer. Must beSherpaMetaData
.
sherpa_meta_data1
is the name of this layer. Must besherpa_meta_data1
.
0 0
means this layer has no inputs or output. Must be0 0
0=3
, 0 is the key and 3 is the value. MUST be0=3
1=12
, 1 is the key and 12 is the value of the parameter--num-encoder-layers
that you provided when running./lstm_transducer_stateless2/export-for-ncnn.py
.
2=512
, 2 is the key and 512 is the value of the parameter--encoder-dim
that you provided when running./lstm_transducer_stateless2/export-for-ncnn.py
.
3=1024
, 3 is the key and 1024 is the value of the parameter--rnn-hidden-size
that you provided when running./lstm_transducer_stateless2/export-for-ncnn.py
.For ease of reference, we list the key-value pairs that you need to add in the following table. If your model has a different setting, please change the values for
SherpaMetaData
accordingly. Otherwise, you will beSAD
.
key
value
0
3 (fixed)
1
--num-encoder-layers
2
--encoder-dim
3
--rnn-hidden-size
Input in0 0 1 in0
. No need to change it.
Caution
When you add a new layer SherpaMetaData
, please remember to update the
number of layers. In our case, update 267
to 268
. Otherwise,
you will be SAD later.
Hint
After adding the new layer SherpaMetaData
, you cannot use this model
with streaming-ncnn-decode.py
anymore since SherpaMetaData
is
supported only in sherpa-ncnn.
Hint
ncnn is very flexible. You can add new layers to it just by text-editing
the param
file! You don’t need to change the bin
file.
Now you can use this model in sherpa-ncnn. Please refer to the following documentation:
Linux/macOS/Windows/arm/aarch64: https://k2-fsa.github.io/sherpa/ncnn/install/index.html
Android
: https://k2-fsa.github.io/sherpa/ncnn/android/index.htmlPython: https://k2-fsa.github.io/sherpa/ncnn/python/index.html
We have a list of pre-trained models that have been exported for sherpa-ncnn:
https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html
You can find more usages there.
7. (Optional) int8 quantization with sherpa-ncnn
This step is optional.
In this step, we describe how to quantize our model with int8
.
Change 4. Export torchscript model via pnnx to
disable fp16
when using pnnx
:
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
pnnx ./encoder_jit_trace-pnnx.pt fp16=0
pnnx ./decoder_jit_trace-pnnx.pt
pnnx ./joiner_jit_trace-pnnx.pt fp16=0
Note
We add fp16=0
when exporting the encoder and joiner. ncnn does not
support quantizing the decoder model yet. We will update this documentation
once ncnn supports it. (Maybe in this year, 2023).
ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*_jit_trace-pnnx.ncnn.{param,bin}
-rw-r--r-- 1 kuangfangjun root 503K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 437 Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param
-rw-r--r-- 1 kuangfangjun root 317M Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 21K Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param
-rw-r--r-- 1 kuangfangjun root 3.0M Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 488 Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param
Let us compare again the file sizes:
File name |
File size |
encoder_jit_trace-pnnx.pt |
318 MB |
decoder_jit_trace-pnnx.pt |
1010 KB |
joiner_jit_trace-pnnx.pt |
3.0 MB |
encoder_jit_trace-pnnx.ncnn.bin (fp16) |
159 MB |
decoder_jit_trace-pnnx.ncnn.bin (fp16) |
503 KB |
joiner_jit_trace-pnnx.ncnn.bin (fp16) |
1.5 MB |
encoder_jit_trace-pnnx.ncnn.bin (fp32) |
317 MB |
joiner_jit_trace-pnnx.ncnn.bin (fp32) |
3.0 MB |
You can see that the file sizes are doubled when we disable fp16
.
Note
You can again use streaming-ncnn-decode.py
to test the exported models.
Next, follow 6. Modify the exported encoder for sherpa-ncnn
to modify encoder_jit_trace-pnnx.ncnn.param
.
Change
7767517
267 379
Input in0 0 1 in0
to
7767517
268 379
SherpaMetaData sherpa_meta_data1 0 0 0=3 1=12 2=512 3=1024
Input in0 0 1 in0
Caution
Please follow 6. Modify the exported encoder for sherpa-ncnn
to change the values for SherpaMetaData
if your model uses a different setting.
Next, let us compile sherpa-ncnn since we will quantize our models within sherpa-ncnn.
# We will download sherpa-ncnn to $HOME/open-source/
# You can change it to anywhere you like.
cd $HOME
mkdir -p open-source
cd open-source
git clone https://github.com/k2-fsa/sherpa-ncnn
cd sherpa-ncnn
mkdir build
cd build
cmake ..
make -j 4
./bin/generate-int8-scale-table
export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH
The output of the above commands are:
(py38) kuangfangjun:build$ generate-int8-scale-table
Please provide 10 arg. Currently given: 1
Usage:
generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt
Each line in wave_filenames.txt is a path to some 16k Hz mono wave file.
We need to create a file wave_filenames.txt
, in which we need to put
some calibration wave files. For testing purpose, we put the test_wavs
from the pre-trained model repository
https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
cd egs/librispeech/ASR
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
cat <<EOF > wave_filenames.txt
../test_wavs/1089-134686-0001.wav
../test_wavs/1221-135766-0001.wav
../test_wavs/1221-135766-0002.wav
EOF
Now we can calculate the scales needed for quantization with the calibration data:
cd egs/librispeech/ASR
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
generate-int8-scale-table \
./encoder_jit_trace-pnnx.ncnn.param \
./encoder_jit_trace-pnnx.ncnn.bin \
./decoder_jit_trace-pnnx.ncnn.param \
./decoder_jit_trace-pnnx.ncnn.bin \
./joiner_jit_trace-pnnx.ncnn.param \
./joiner_jit_trace-pnnx.ncnn.bin \
./encoder-scale-table.txt \
./joiner-scale-table.txt \
./wave_filenames.txt
The output logs are in the following:
Don't Use GPU. has_gpu: 0, config.use_vulkan_compute: 1
num encoder conv layers: 28
num joiner conv layers: 3
num files: 3
Processing ../test_wavs/1089-134686-0001.wav
Processing ../test_wavs/1221-135766-0001.wav
Processing ../test_wavs/1221-135766-0002.wav
Processing ../test_wavs/1089-134686-0001.wav
Processing ../test_wavs/1221-135766-0001.wav
Processing ../test_wavs/1221-135766-0002.wav
----------encoder----------
conv_15 : max = 15.942385 threshold = 15.930708 scale = 7.972025
conv_16 : max = 44.978855 threshold = 17.031788 scale = 7.456645
conv_17 : max = 17.868437 threshold = 7.830528 scale = 16.218575
linear_18 : max = 3.107259 threshold = 1.194808 scale = 106.293236
linear_19 : max = 6.193777 threshold = 4.634748 scale = 27.401705
linear_20 : max = 9.259933 threshold = 2.606617 scale = 48.722160
linear_21 : max = 5.186600 threshold = 4.790260 scale = 26.512129
linear_22 : max = 9.759041 threshold = 2.265832 scale = 56.050053
linear_23 : max = 3.931209 threshold = 3.099090 scale = 40.979767
linear_24 : max = 10.324160 threshold = 2.215561 scale = 57.321835
linear_25 : max = 3.800708 threshold = 3.599352 scale = 35.284134
linear_26 : max = 10.492444 threshold = 3.153369 scale = 40.274391
linear_27 : max = 3.660161 threshold = 2.720994 scale = 46.674126
linear_28 : max = 9.415265 threshold = 3.174434 scale = 40.007133
linear_29 : max = 4.038418 threshold = 3.118534 scale = 40.724262
linear_30 : max = 10.072084 threshold = 3.936867 scale = 32.259155
linear_31 : max = 4.342712 threshold = 3.599489 scale = 35.282787
linear_32 : max = 11.340535 threshold = 3.120308 scale = 40.701103
linear_33 : max = 3.846987 threshold = 3.630030 scale = 34.985939
linear_34 : max = 10.686298 threshold = 2.204571 scale = 57.607586
linear_35 : max = 4.904821 threshold = 4.575518 scale = 27.756420
linear_36 : max = 11.806659 threshold = 2.585589 scale = 49.118401
linear_37 : max = 6.402340 threshold = 5.047157 scale = 25.162680
linear_38 : max = 11.174589 threshold = 1.923361 scale = 66.030258
linear_39 : max = 16.178576 threshold = 7.556058 scale = 16.807705
linear_40 : max = 12.901954 threshold = 5.301267 scale = 23.956539
linear_41 : max = 14.839805 threshold = 7.597429 scale = 16.716181
linear_42 : max = 10.178945 threshold = 2.651595 scale = 47.895699
----------joiner----------
linear_2 : max = 24.829245 threshold = 16.627592 scale = 7.637907
linear_1 : max = 10.746186 threshold = 5.255032 scale = 24.167313
linear_3 : max = 1.000000 threshold = 0.999756 scale = 127.031013
ncnn int8 calibration table create success, best wish for your int8 inference has a low accuracy loss...\(^0^)/...233...
It generates the following two files:
ls -lh encoder-scale-table.txt joiner-scale-table.txt
-rw-r--r-- 1 kuangfangjun root 345K Feb 17 12:13 encoder-scale-table.txt
-rw-r--r-- 1 kuangfangjun root 17K Feb 17 12:13 joiner-scale-table.txt
Caution
Definitely, you need more calibration data to compute the scale table.
Finally, let us use the scale table to quantize our models into int8
.
ncnn2int8
usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table]
First, we quantize the encoder model:
cd egs/librispeech/ASR
cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
ncnn2int8 \
./encoder_jit_trace-pnnx.ncnn.param \
./encoder_jit_trace-pnnx.ncnn.bin \
./encoder_jit_trace-pnnx.ncnn.int8.param \
./encoder_jit_trace-pnnx.ncnn.int8.bin \
./encoder-scale-table.txt
Next, we quantize the joiner model:
ncnn2int8 \
./joiner_jit_trace-pnnx.ncnn.param \
./joiner_jit_trace-pnnx.ncnn.bin \
./joiner_jit_trace-pnnx.ncnn.int8.param \
./joiner_jit_trace-pnnx.ncnn.int8.bin \
./joiner-scale-table.txt
The above two commands generate the following 4 files:
-rw-r--r-- 1 kuangfangjun root 218M Feb 17 12:19 encoder_jit_trace-pnnx.ncnn.int8.bin
-rw-r--r-- 1 kuangfangjun root 21K Feb 17 12:19 encoder_jit_trace-pnnx.ncnn.int8.param
-rw-r--r-- 1 kuangfangjun root 774K Feb 17 12:19 joiner_jit_trace-pnnx.ncnn.int8.bin
-rw-r--r-- 1 kuangfangjun root 496 Feb 17 12:19 joiner_jit_trace-pnnx.ncnn.int8.param
Congratulations! You have successfully quantized your model from float32
to int8
.
Caution
ncnn.int8.param
and ncnn.int8.bin
must be used in pairs.
You can replace ncnn.param
and ncnn.bin
with ncnn.int8.param
and ncnn.int8.bin
in sherpa-ncnn if you like.
For instance, to use only the int8
encoder in sherpa-ncnn
, you can
replace the following invocation:
cd egs/librispeech/ASR cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ sherpa-ncnn \ ../data/lang_bpe_500/tokens.txt \ ./encoder_jit_trace-pnnx.ncnn.param \ ./encoder_jit_trace-pnnx.ncnn.bin \ ./decoder_jit_trace-pnnx.ncnn.param \ ./decoder_jit_trace-pnnx.ncnn.bin \ ./joiner_jit_trace-pnnx.ncnn.param \ ./joiner_jit_trace-pnnx.ncnn.bin \ ../test_wavs/1089-134686-0001.wav
with
cd egs/librispeech/ASR cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/ sherpa-ncnn \ ../data/lang_bpe_500/tokens.txt \ ./encoder_jit_trace-pnnx.ncnn.int8.param \ ./encoder_jit_trace-pnnx.ncnn.int8.bin \ ./decoder_jit_trace-pnnx.ncnn.param \ ./decoder_jit_trace-pnnx.ncnn.bin \ ./joiner_jit_trace-pnnx.ncnn.param \ ./joiner_jit_trace-pnnx.ncnn.bin \ ../test_wavs/1089-134686-0001.wav
The following table compares again the file sizes:
File name |
File size |
encoder_jit_trace-pnnx.pt |
318 MB |
decoder_jit_trace-pnnx.pt |
1010 KB |
joiner_jit_trace-pnnx.pt |
3.0 MB |
encoder_jit_trace-pnnx.ncnn.bin (fp16) |
159 MB |
decoder_jit_trace-pnnx.ncnn.bin (fp16) |
503 KB |
joiner_jit_trace-pnnx.ncnn.bin (fp16) |
1.5 MB |
encoder_jit_trace-pnnx.ncnn.bin (fp32) |
317 MB |
joiner_jit_trace-pnnx.ncnn.bin (fp32) |
3.0 MB |
encoder_jit_trace-pnnx.ncnn.int8.bin |
218 MB |
joiner_jit_trace-pnnx.ncnn.int8.bin |
774 KB |
You can see that the file size of the joiner model after int8
quantization
is much smaller. However, the size of the encoder model is even larger than
the fp16
counterpart. The reason is that ncnn currently does not support
quantizing LSTM
layers into 8-bit
. Please see
https://github.com/Tencent/ncnn/issues/4532
Hint
Currently, only linear layers and convolutional layers are quantized
with int8
, so you don’t see an exact 4x
reduction in file sizes.
Note
You need to test the recognition accuracy after int8
quantization.
That’s it! Have fun with sherpa-ncnn!