Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021 <BR>(Oral presentation)

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021
(Oral presentation)

Pablo Gimeno (Universidad de Zaragoza, Spain), Alfonso Ortega (Universidad de Zaragoza, Spain), Antonio Miguel (Universidad de Zaragoza, Spain), Eduardo Lleida (Universidad de Zaragoza, Spain)

In this paper, we describe the ViVoLab speech activity detection (SAD) system submitted to the Fearless Steps Challenge Phase III. This series of challenges have proposed a number of speech processing task dealing with audio from Apollo space missions over the last few years. The focus in this edition is set on the generalisation capabilities of the systems, with new evaluation data from different channels. Our proposed submission is based on the use of the unsupervised representation learning paradigm, seeking to obtain a new and more discriminative audio representation than traditional perceptual features such as log Mel-filterbank energies. These new features are used to train different variations of a convolutional recurrent neural network (CRNN). Experimental results show that features learned via unsupervised learning provide a much more robust representation, significantly reducing the mismatch observed between development and evaluation partition results. Obtained results largely outperform the organisation baseline, achieving a DCF metric of 2.98% on the evaluation set and ranking third among all the participant teams.

InterSpeech 2021

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021
(Oral presentation)

Search in Audio

Related Recordings

The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge
(Oral presentation)

Speech Activity Detection Based on Multilingual Speech Recognition System
(Oral presentation)

InterSpeech 2021

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021 (Oral presentation)

Search in Audio

Related Recordings

The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge (Oral presentation)

Speech Activity Detection Based on Multilingual Speech Recognition System (Oral presentation)

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021
(Oral presentation)

The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge
(Oral presentation)

Speech Activity Detection Based on Multilingual Speech Recognition System
(Oral presentation)