How to create a recipe
Hint
Please read Follow the code style to adjust your code style.
Caution
icefall
is designed to be as Pythonic as possible. Please use
Python in your recipe if possible.
Data Preparation
We recommend you to prepare your training/test/validate dataset with lhotse.
Please refer to https://lhotse.readthedocs.io/en/latest/index.html
for how to create a recipe in lhotse
.
Hint
The yesno
recipe in lhotse
is a very good example.
Please refer to https://github.com/lhotse-speech/lhotse/pull/380,
which shows how to add a new recipe to lhotse
.
Suppose you would like to add a recipe for a dataset named foo
.
You can do the following:
$ cd egs
$ mkdir -p foo/ASR
$ cd foo/ASR
$ touch prepare.sh
$ chmod +x prepare.sh
If your dataset is very simple, please follow
egs/yesno/ASR/prepare.sh
to write your own prepare.sh
.
Otherwise, please refer to
egs/librispeech/ASR/prepare.sh
to prepare your data.
Training
Assume you have a fancy model, called bar
for the foo
recipe, you can
organize your files in the following way:
$ cd egs/foo/ASR
$ mkdir bar
$ cd bar
$ touch README.md model.py train.py decode.py asr_datamodule.py pretrained.py
For instance , the yesno
recipe has a tdnn
model and its directory structure
looks like the following:
egs/yesno/ASR/tdnn/
|-- README.md
|-- asr_datamodule.py
|-- decode.py
|-- model.py
|-- pretrained.py
`-- train.py
File description:
README.md
It contains information of this recipe, e.g., how to run it, what the WER is, etc.
asr_datamodule.py
It provides code to create PyTorch dataloaders with train/test/validation dataset.
decode.py
It takes as inputs the checkpoints saved during the training stage to decode the test dataset(s).
model.py
It contains the definition of your fancy neural network model.
pretrained.py
We can use this script to do inference with a pre-trained model.
train.py
It contains training code.
Hint
Please take a look at
to get a feel what the resulting files look like.
Note
Every model in a recipe is kept to be as self-contained as possible. We tolerate duplicate code among different recipes.
The training stage should be invocable by:
$ cd egs/foo/ASR $ ./bar/train.py $ ./bar/train.py --help
Decoding
Please refer to
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conformer_ctc/decode.py
If your model is transformer/conformer based.
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py
If your model is TDNN/LSTM based, i.e., there is no attention decoder.
https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/tdnn/decode.py
If there is no LM rescoring.
The decoding stage should be invocable by:
$ cd egs/foo/ASR $ ./bar/decode.py $ ./bar/decode.py --help
Pre-trained model
Please demonstrate how to use your model for inference in egs/foo/ASR/bar/pretrained.py
.
If possible, please consider creating a Colab notebook to show that.