Lookup-Table Recurrent Language Models for Long Tail Speech Recognition <BR>(3 minutes introduction)

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
(3 minutes introduction)

W. Ronny Huang (Google, USA), Tara N. Sainath (Google, USA), Cal Peyser (Google, USA), Shankar Kumar (Google, USA), David Rybach (Google, USA), Trevor Strohman (Google, USA)

We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we instantiate an (additional) embedding table which embeds the previous n-gram token sequence, rather than a single token. This allows the embedding table to be scaled up arbitrarily — with a commensurate increase in performance — without changing the token vocabulary. Since embeddings are sparsely retrieved from the table via a lookup; increasing the size of the table adds neither extra operations to each forward pass nor extra parameters that need to be stored on limited GPU/TPU memory. We explore scaling n-gram embedding tables up to nearly a billion parameters. When trained on a 3-billion sentence corpus, we find that LookupLM improves long tail log perplexity by 2.44 and long tail WER by 23.4% on a downstream speech recognition task over a standard RNN language model baseline, an improvement comparable to a scaling up the baseline by 6.2× the number of floating point operations.

InterSpeech 2021

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
(3 minutes introduction)

Search in Audio

Related Recordings

Phonetically Induced Subwords for End-to-End Speech Recognition
(3 minutes introduction)

Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems
(3 minutes introduction)

InterSpeech 2021

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition (3 minutes introduction)

Search in Audio

Related Recordings

Phonetically Induced Subwords for End-to-End Speech Recognition (3 minutes introduction)

Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems (3 minutes introduction)

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
(3 minutes introduction)

Phonetically Induced Subwords for End-to-End Speech Recognition
(3 minutes introduction)

Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems
(3 minutes introduction)