Regularizing Word Segmentation by Creating Misspellings <BR>(3 minutes introduction)

Regularizing Word Segmentation by Creating Misspellings
(3 minutes introduction)

Hainan Xu (Google, USA), Kartik Audhkhasi (Google, USA), Yinghui Huang (Google, USA), Jesse Emond (Google, USA), Bhuvana Ramabhadran (Google, USA)

This work focuses on improving subword segmentation algorithms for end-to-end speech recognition models, and makes two major contributions. Firstly, we propose a novel word segmentation algorithm. The algorithm uses the same vocabulary generated by a regular wordpiece model, is easily extensible and supports a variety of regularization techniques in the segmentation space, and outperforms the regular wordpiece model. Secondly, we propose a number of novel regularization methods that introduce randomness into the tokenization algorithm, which bring further improvements in speech recognition accuracy, with relative gains up to 8.4% compared to the original wordpiece model. We analyze the methods and show that our proposed methods are equivalent to a sophisticated form of label smoothing, which performs smoothing based on the prefix structures of subword units. A noteworthy discovery from this work is that creating artificial misspellings in words results in the best performance among all the methods, which could inspire future research for strategies in this area.

InterSpeech 2021

Regularizing Word Segmentation by Creating Misspellings
(3 minutes introduction)

Search in Audio

Related Recordings

Towards Lifelong Learning of End-to-end ASR
(longer introduction)

Multitask Training with Text Data for End-to-End Speech Recognition
(3 minutes introduction)

InterSpeech 2021

Regularizing Word Segmentation by Creating Misspellings (3 minutes introduction)

Search in Audio

Related Recordings

Towards Lifelong Learning of End-to-end ASR (longer introduction)

Multitask Training with Text Data for End-to-End Speech Recognition (3 minutes introduction)

Regularizing Word Segmentation by Creating Misspellings
(3 minutes introduction)

Towards Lifelong Learning of End-to-end ASR
(longer introduction)

Multitask Training with Text Data for End-to-End Speech Recognition
(3 minutes introduction)