Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models <BR>(Oral presentation)

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models
(Oral presentation)

Mohammad Zeineldeen (RWTH Aachen University, Germany), Aleksandr Glushko (RWTH Aachen University, Germany), Wilfried Michel (RWTH Aachen University, Germany), Albert Zeyer (RWTH Aachen University, Germany), Ralf Schlüter (RWTH Aachen University, Germany), Hermann Ney (RWTH Aachen University, Germany)

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.

Search in Audio

Related Recordings

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
(Oral presentation)

Timo Lohrenz , Zhengyang Li , Tim Fingscheidt

Conditional Independence for Pretext Task Selection in Self-Supervised Speech Representation Learning
(Oral presentation)

Salah Zaiem , France), Titouan Parcollet , France), Slim Essid , France)

InterSpeech 2021

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models (Oral presentation)

Search in Audio

Related Recordings

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition (Oral presentation)

Conditional Independence for Pretext Task Selection in Self-Supervised Speech Representation Learning (Oral presentation)

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models
(Oral presentation)

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
(Oral presentation)

Conditional Independence for Pretext Task Selection in Self-Supervised Speech Representation Learning
(Oral presentation)