How to create a recipe
Hint
Please read Follow the code style to adjust your code style.
Caution
icefall is designed to be as Pythonic as possible. Please use
Python in your recipe if possible.
Data Preparation
We recommend you to prepare your training/test/validate dataset with lhotse.
Please refer to https://lhotse.readthedocs.io/en/latest/index.html
for how to create a recipe in lhotse.
Hint
The yesno recipe in lhotse is a very good example.
Please refer to https://github.com/lhotse-speech/lhotse/pull/380,
which shows how to add a new recipe to lhotse.
Suppose you would like to add a recipe for a dataset named foo.
You can do the following:
$ cd egs
$ mkdir -p foo/ASR
$ cd foo/ASR
$ touch prepare.sh
$ chmod +x prepare.sh
If your dataset is very simple, please follow
egs/yesno/ASR/prepare.sh
to write your own prepare.sh.
Otherwise, please refer to
egs/librispeech/ASR/prepare.sh
to prepare your data.
Training
Assume you have a fancy model, called bar for the foo recipe, you can
organize your files in the following way:
$ cd egs/foo/ASR
$ mkdir bar
$ cd bar
$ touch README.md model.py train.py decode.py asr_datamodule.py pretrained.py
For instance , the yesno recipe has a tdnn model and its directory structure
looks like the following:
egs/yesno/ASR/tdnn/
|-- README.md
|-- asr_datamodule.py
|-- decode.py
|-- model.py
|-- pretrained.py
`-- train.py
File description:
README.mdIt contains information of this recipe, e.g., how to run it, what the WER is, etc.
asr_datamodule.pyIt provides code to create PyTorch dataloaders with train/test/validation dataset.
decode.pyIt takes as inputs the checkpoints saved during the training stage to decode the test dataset(s).
model.pyIt contains the definition of your fancy neural network model.
pretrained.pyWe can use this script to do inference with a pre-trained model.
train.pyIt contains training code.
Hint
Please take a look at
to get a feel what the resulting files look like.
Note
Every model in a recipe is kept to be as self-contained as possible. We tolerate duplicate code among different recipes.
The training stage should be invocable by:
$ cd egs/foo/ASR $ ./bar/train.py $ ./bar/train.py --help
Decoding
Please refer to
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conformer_ctc/decode.py
If your model is transformer/conformer based.
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py
If your model is TDNN/LSTM based, i.e., there is no attention decoder.
https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/tdnn/decode.py
If there is no LM rescoring.
The decoding stage should be invocable by:
$ cd egs/foo/ASR $ ./bar/decode.py $ ./bar/decode.py --help
Pre-trained model
Please demonstrate how to use your model for inference in egs/foo/ASR/bar/pretrained.py.
If possible, please consider creating a Colab notebook to show that.