This page shows you how to run the yesno recipe. It contains:
Prepare data for training
Train a TDNN model
View text format logs and visualize TensorBoard logs
Select device type, i.e., CPU and GPU, for training
Change training options
Resume training from a checkpoint
Decode with a trained model
Select a checkpoint for decoding
Model averaging
Colab notebook
It shows you step by step how to setup the environment, how to do training,
and how to do decoding
How to use a pre-trained model
Inference with a pre-trained model
Download a pre-trained model, provided by us
Decode a single sound file with a pre-trained model
Decode multiple sound files at the same time
It does NOT show you:
How to train with multiple GPUs
The yesno dataset is so small that CPU is more than enough
for training as well as for decoding.
How to use LM rescoring for decoding
The dataset does not have an LM for rescoring.
Hint
We assume you have read the page Installation and have setup
the environment for icefall.
Hint
You don’t need a GPU to run this recipe. It can be run on a CPU.
The training part takes less than 30 seconds on a CPU and you will get
the following WER at the end:
By default, it will run 15 epochs. Training logs and checkpoints are saved
in tdnn/exp.
In tdnn/exp, you will find the following files:
epoch-0.pt, epoch-1.pt, …
These are checkpoint files, containing model state_dict and optimizer state_dict.
To resume training from some checkpoint, say epoch-10.pt, you can use:
$./tdnn/train.py--start-epoch11
tensorboard/
This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
$cdtdnn/exp/tensorboard
$tensorboarddevupload--logdir.--description"TDNN training for yesno with icefall"
It will print something like below:
TensorFlowinstallationnotfound-runningwithreducedfeatureset.Uploadstartedandwillcontinuereadinganynewdataasit's added to the logdir.Tostopuploading,pressCtrl-C.Newexperimentcreated.ViewyourTensorBoardat:https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/[2021-08-23T23:49:41]Startedscanninglogdir.[2021-08-23T23:49:42]Totaluploaded:135scalars,0tensors,0binaryobjectsListeningfornewdatainlogdir...
Note there is a URL in the above output, click it and you will see
the following screenshot:
It is the detailed training log in text format, same as the one
you saw printed to the console during training.
Note
By default, ./tdnn/train.py uses GPU 0 for training if GPUs are available.
If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
training, you can run:
$exportCUDA_VISIBLE_DEVICES="1"
$./tdnn/train.py
Since the yesno dataset is very small, containing only 30 sound files
for training, and the model in use is also very small, we use:
$exportCUDA_VISIBLE_DEVICES=""
so that ./tdnn/train.py uses CPU during training.
If you don’t have GPUs, then you don’t need to
run exportCUDA_VISIBLE_DEVICES="".
To see available training options, you can use:
$./tdnn/train.py--help
Other training options, e.g., learning rate, results dir, etc., are
pre-configured in the function get_params()
in tdnn/train.py.
Normally, you don’t need to change them. You can change them by modifying the code, if
you want.
The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.
The command for decoding is:
$exportCUDA_VISIBLE_DEVICES=""
$./tdnn/decode.py
You will see the WER in the output log.
Decoded results are saved in tdnn/exp.
$./tdnn/decode.py--help
shows you the available decoding options.
Some commonly used options are:
--epoch
You can select which checkpoint to be used for decoding.
For instance, ./tdnn/decode.py--epoch10 means to use
./tdnn/exp/epoch-10.pt for decoding.
--avg
It’s related to model averaging. It specifies number of checkpoints
to be averaged. The averaged model is used for decoding.
For example, the following command:
$./tdnn/decode.py--epoch10--avg3
uses the average of epoch-8.pt, epoch-9.pt and epoch-10.pt
for decoding.
--export
If it is True, i.e., ./tdnn/decode.py--export1, the code
will save the averaged model to tdnn/exp/pretrained.pt.
See Pre-trained Model for how to use it.