Pretrained model with GigaSpeech
Hint
We assume you have installed sherpa by following
Installation before you start this section.
Download the pretrained model
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2
Hint
You can find the training script by visiting https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/RESULTS.md#gigaspeech-bpe-training-results-pruned-transducer-2
The torchscript model is exported using the script https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/pruned_transducer_stateless2/export.py
Caution
You have to use git lfs to download/clone the repo. Otherwise, you will be SAD later.
After cloning the repo, you will find the following files:
icefall-asr-gigaspeech-pruned-transducer-stateless2/
|-- README.md
|-- data
| `-- lang_bpe_500
| `-- bpe.model
|-- exp
| |-- cpu_jit-iter-3488000-avg-15.pt
| |-- cpu_jit-iter-3488000-avg-20.pt
| |-- pretrained-iter-3488000-avg-15.pt
| `-- pretrained-iter-3488000-avg-20.pt
data/lang_bpe_500/bpe.modelis the BPE model used in the trainingexp/cpu_jit-iter-3488000-avg-15.ptandexp/cpu_jit-iter-3488000-avg-20.ptare two torchscript models exported usingtorch.jit.script(). We can use any of them in the following tests.
Note
We won’t use pretrained-xxx.pt in sherpa.
Before we start, let us generate tokens.txt from the above bpe.model:
cd icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500
wget https://raw.githubusercontent.com/k2-fsa/sherpa/master/scripts/bpe_model_to_tokens.py
./bpe_model_to_tokens.py ./bpe.model > tokens.txt
Since the above repo does not contain test waves, we download some test files from https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13. for testing.
cd icefall-asr-gigaspeech-pruned-transducer-stateless2
mkdir test_wavs
cd test_wavs
wget https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/resolve/main/test_wavs/1089-134686-0001.wav
wget https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/resolve/main/test_wavs/1221-135766-0001.wav
wget https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/resolve/main/test_wavs/1221-135766-0002.wav
In the following, we show you how to use the downloaded model for speech recognition.
Decode a single wave
nn_model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
wav1=./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav
sherpa \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
$wav1
You will see the following output:
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/parse_options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2022-08-20 22:35:42 sherpa --nn-model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt --tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt --use-gpu=false ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/sherpa.cc:126:int main(int, char**) 2022-08-20 22:35:42
--nn-model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
--tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
--decoding-method=greedy_search
--use-gpu=false
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/sherpa.cc:270:int main(int, char**) 2022-08-20 22:35:43
filename: ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav
result: AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
Hint
You can pass the option --use-gpu=true to use GPU for computation (Assume
you have installed a CUDA version of sherpa).
Also, you can use --decoding-method=modified_beam_search to change
the decoding method.
Decode multiple waves in parallel
nn_model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
wav1=./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav
wav2=./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0001.wav
wav3=./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0002.wav
sherpa \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
$wav1 \
$wav2 \
$wav3
You will see the following output:
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/parse_options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2022-08-20 22:38:18 sherpa --nn-model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt --tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt --use-gpu=false ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0001.wav ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0002.wav
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/sherpa.cc:126:int main(int, char**) 2022-08-20 22:38:19
--nn-model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
--tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
--decoding-method=greedy_search
--use-gpu=false
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/sherpa.cc:284:int main(int, char**) 2022-08-20 22:38:23
filename: ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav
result: AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
filename: ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0001.wav
result: GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
filename: ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0002.wav
result: YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
Decode wav.scp
If you have some experience with Kaldi, you must know what wav.scp is.
We use the following code to generate wav.scp for our test data.
cat > wav.scp <<EOF
wav1 ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1089-134686-0001.wav
wav2 ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0001.wav
wav3 ./icefall-asr-gigaspeech-pruned-transducer-stateless2/test_wavs/1221-135766-0002.wav
EOF
With the wav.scp ready, we can decode it with the following commands:
nn_model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
sherpa \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
--use-wav-scp=true \
scp:wav.scp \
ark,scp,t:results.ark,results.scp
You will see the following output:
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/parse_options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2022-08-20 22:40:36 sherpa --nn-model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt --tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt --use-gpu=false --use-wav-scp=true scp:wav.scp ark,scp,t:results.ark,results.scp
[I] /usr/share/miniconda/envs/sherpa/conda-bld/sherpa_1661003501349/work/sherpa/csrc/sherpa.cc:126:int main(int, char**) 2022-08-20 22:40:37
--nn-model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
--tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
--decoding-method=greedy_search
--use-gpu=false
We can view the recognition results using:
$ cat results.ark
wav1 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
wav2 GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
wav3 YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
Hint
You can pass the option --batch-size=20 to control the batch size to be 20
during decoding.
Decode feats.scp
If you have precomputed feats, you can decode it with the following code:
nn_model=./icefall-asr-gigaspeech-pruned-transducer-stateless2/exp/cpu_jit-iter-3488000-avg-15.pt
tokens=./icefall-asr-gigaspeech-pruned-transducer-stateless2/data/lang_bpe_500/tokens.txt
sherpa \
--nn-model=$nn_model \
--tokens=$tokens \
--use-gpu=false \
--use-feats-scp=true \
scp:feats.scp \
ark,scp,t:results.ark,results.scp
Hint
You can pass the option --batch-size=20 to control the batch size to be 20
during decoding.
Caution
feats.scp generated by kaldi’s compute-fbank-feats is using
unnormalized samples. That is, audio samples are in the range
[-32768, 32767]. However, models from icefall are trained with
features using normalized samples, i.e., samples in the range [-1, 1].
You cannot use feats.scp generated by Kaldi’s compute-fbank-feats
to test models trained from icefall using normalized audio samples.
Otherwise, you won’t get good recognition results.
It is perfectly OK to decode feats.scp from Kaldi using a model
trained with features using unnormalized audio samples.
Note
We provide a script to generate feats.ark and feats.scp from
wav.scp that can be used with models trained by icefall. Please see
https://github.com/k2-fsa/sherpa/blob/master/.github/scripts/generate_feats_scp.py