ConvEmformer transducer based streaming ASR

This page describes how to use sherpa for streaming ASR with ConvEmformer transducer models trained with pruned stateless transdcuer.

There are no recurrent modules in the transducer model:

  • The encoder network (i.e., the transcription network) is a ConvEmformer model

  • The decoder network (i.e., the prediction network) is a stateless network, consisting of an nn.Embedding() and a nn.Conv1d().

  • The joiner network (i.e., the joint network) contains an adder, a tanh activation, and a nn.Linear().

Streaming ASR in this section consists of two components: