InterSpeech 2021

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

Aku Rouhe
SpeechBrain is a novel open-source speech toolkit natively designed to support various speech and audio processing applications. It currently supports a large variety of tasks, such as speech recognition, speaker recognition, speech enhancement, speech separation, multi-microphone signal processing, just to name a few. This toolkit is very flexible, modular, easy-to-use, well-document, and can be used to quickly develop speech technologies. With this tutorial, we would like to present, for the first time, SpeechBrain to the INTERSPEECH attenders. First, the design and the general architecture of SpeechBrain will be discussed. Then, its flexibility and simplicity will be shown through practical examples on different speech tasks. Mirco Ravanelli is currently a postdoc researcher at Mila (Université de Montréal) working under the supervision of Prof. Yoshua Bengio. His main research interests are deep learning, speech recognition, far-field speech recognition, cooperative learning, and self-supervised learning. He is the author or co-author of more than 40 papers on these research topics. He received his PhD (with cum laude distinction) from the University of Trento in December 2017. Mirco is an active member of the speech and machine learning communities. He is founder and leader of the SpeechBrain project. Titouan Parcollet is an associate professor in computer science at the Laboratoire Informatique d’Avignon (LIA), from Avignon University (FR) and a visiting scholar at the Cambridge Machine Learning Systems Lab from the University of Cambridge (UK). Previously, he was a senior research associate at the University of Oxford (UK) within the Oxford Machine Learning Systems group. He received his PhD in computer science from the University of Avignon (France) and in partnership with Orkis focusing on quaternion neural networks, automatic speech recognition, and representation learning. His current work involves efficient speech recognition, federated learning and self-supervised learning. He is also currently collaborating with the Mila-Quebec AI institute on the SpeechBrain project.