InterSpeech 2021

Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition

Omprakash Sonie, Kannan Venkateshan
Training Automatic Speech Recognition (ASR) models usually requires transcribing large quantities of audio, which is both expensive and time-consuming. To overcome this limitation, and many semi-supervised training approaches have been proposed to take advantage of abundant unpaired audio and text data. In this tutorial we describe the conceptual understanding and implementation of semi-supervised speech applications - Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) applications. We begin the tutorial with concepts for core building blocks which include Speech pre-processing, Transformer, Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). We also describe the state-of-the-art approaches in this domain, and the key ideas underlying them. We walk through the code for implementations. We provide details for installation prerequisites and code using Jupyter notebooks with comments on concepts, key steps, visualization and results. We believe that a self-contained tutorial giving a good overview of the core techniques with sufficient mathematical background along with actual code will be of immense help to participants. Omprakash Sonie Om is a data scientist at Flipkart who has been working on Speech Recognition Systems, Recommender Systems and Natural Language Processing. Om is passionate about providing guidance to budding data scientists for quality machine learning, deep learning and reinforcement learning using DeepThinking.AI platform. Om is organiser of local Deep Learning meetup. Om plans to write books on Code to Concept for Machine. Om (as primary author) has presented tutorials and conducted hands-on workshops at KDD, WWW (TheWeb), RecSys (2018, 2019), ECIR, IJCAI, GTC-Nvidia and various meet-ups. Venkateshan Kannan Venkateshan is a data scientist at Flipkart who is presently working in the domain of speech recognition. In the past, he has worked on diverse problems related to complex networks, information theory, disease modeling, dynamic assignment algorithms, vehicle route optimization, etc. He has a PhD. in theoretical physics.