This recipe uses GigaSpeech + LibriSpeech during training.
(1) and (2) use the same model architecture. The only difference is that (2) supports
multi-dataset. Since (2) uses more data, it has a lower WER than (1) but it needs
more training time.
We use lstm_transducer_stateless2 as an example below.
Note
You need to download the GigaSpeech dataset
to run (2). If you have only LibriSpeech dataset available, feel free to use (1).
If you have pre-downloaded the LibriSpeech
dataset and the musan dataset, say,
they are saved in /tmp/LibriSpeech and /tmp/musan, you can modify
the dl_dir variable in ./prepare.sh to point to /tmp so that
./prepare.sh won’t re-download them.
Note
All generated files by ./prepare.sh, e.g., features, lexicon, etc,
are saved in ./data directory.
We provide the following YouTube video showing how to run ./prepare.sh.
Note
To get the latest news of next-gen Kaldi, please subscribe
the following YouTube channel by Nadira Povey:
shows you the training options that can be passed from the commandline.
The following options are used quite often:
--full-libri
If it’s True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
train-clean-100, which has 100 hours of training data.
Caution
The training set is perturbed by speed with two factors: 0.9 and 1.1.
If --full-libri is True, each epoch actually processes
3x960==2880 hours of data.
--num-epochs
It is the number of epochs to train. For instance,
./lstm_transducer_stateless2/train.py--num-epochs30 trains for 30 epochs
and generates epoch-1.pt, epoch-2.pt, …, epoch-30.pt
in the folder ./lstm_transducer_stateless2/exp.
--start-epoch
It’s used to resume training.
./lstm_transducer_stateless2/train.py--start-epoch10 loads the
checkpoint ./lstm_transducer_stateless2/exp/epoch-9.pt and starts
training from epoch 10, based on the state from epoch 9.
--world-size
It is used for multi-GPU single-machine DDP training.
If it is 1, then no DDP training is used.
If it is 2, then GPU 0 and GPU 1 are used for DDP training.
The following shows some use cases with it.
Use case 1: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:
There are some training options, e.g., weight decay,
number of warmup steps, results dir, etc,
that are not passed from the commandline.
They are pre-configured by the function get_params() in
lstm_transducer_stateless2/train.py
You don’t need to change these pre-configured parameters. If you really need to change
them, please modify ./lstm_transducer_stateless2/train.py directly.
Training logs and checkpoints are saved in lstm_transducer_stateless2/exp.
You will find the following files in that directory:
epoch-1.pt, epoch-2.pt, …
These are checkpoint files saved at the end of each epoch, containing model
state_dict and optimizer state_dict.
To resume training from some checkpoint, say epoch-10.pt, you can use:
These are checkpoint files saved every --save-every-n batches,
containing model state_dict and optimizer state_dict.
To resume training from some checkpoint, say checkpoint-436000, you can use:
This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
$cdlstm_transducer_stateless2/exp/tensorboard
$tensorboarddevupload--logdir.--description"LSTM transducer training for LibriSpeech with icefall"
It will print something like below:
TensorFlowinstallationnotfound-runningwithreducedfeatureset.Uploadstartedandwillcontinuereadinganynewdataasit's added to the logdir.Tostopuploading,pressCtrl-C.Newexperimentcreated.ViewyourTensorBoardat:https://tensorboard.dev/experiment/cj2vtPiwQHKN9Q1tx6PTpg/[2022-09-20T15:50:50]Startedscanninglogdir.Uploading4468scalars...[2022-09-20T15:53:02]Totaluploaded:210171scalars,0tensors,0binaryobjectsListeningfornewdatainlogdir...
Note there is a URL in the above output. Click it and you will see
the following screenshot:
The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.
Hint
There are two kinds of checkpoints:
(1) epoch-1.pt, epoch-2.pt, …, which are saved at the end
of each epoch. You can pass --epoch to
lstm_transducer_stateless2/decode.py to use them.
(2) checkpoints-436000.pt, epoch-438000.pt, …, which are saved
every --save-every-n batches. You can pass --iter to
lstm_transducer_stateless2/decode.py to use them.
We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.
Checkpoints saved by lstm_transducer_stateless2/train.py also include
optimizer.state_dict(). It is useful for resuming training. But after training,
we are interested only in model.state_dict(). You can use the following
command to extract model.state_dict().
# Assume that --iter 468000 --avg 16 produces the smallest WER# (You can get such information after running ./lstm_transducer_stateless2/decode.py)iter=468000avg=16
./lstm_transducer_stateless2/export.py\--exp-dir./lstm_transducer_stateless2/exp\--bpe-modeldata/lang_bpe_500/bpe.model\--iter$iter\--avg$avg
It will generate a file ./lstm_transducer_stateless2/exp/pretrained.pt.
Hint
To use the generated pretrained.pt for lstm_transducer_stateless2/decode.py,
you can run: