Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing <BR>(3 minutes introduction)

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
(3 minutes introduction)

Benjamin van Niekerk (Stellenbosch University, South Africa), Leanne Nortje (Stellenbosch University, South Africa), Matthew Baas (Stellenbosch University, South Africa), Herman Kamper (Stellenbosch University, South Africa)

Contrastive predictive coding (CPC) aims to learn representations of speech by distinguishing future observations from a set of negative examples. Previous work has shown that linear classifiers trained on CPC features can accurately predict speaker and phone labels. However, it is unclear how the features actually capture speaker and phonetic information, and whether it is possible to normalize out the irrelevant details (depending on the downstream task). In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent. Concretely, we find that comparing means performs well on a speaker verification task. Next, probing experiments show that standardizing the features effectively removes speaker information. Based on this observation, we propose a speaker normalization step to improve acoustic unit discovery using K-means clustering of CPC features. Finally, we show that a language model trained on the resulting units achieves some of the best results in the ZeroSpeech2021 Challenge.

Search in Audio

Related Recordings

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language
(3 minutes introduction)

Christiaan Jacobs , Herman Kamper

The Zero Resource Speech Challenge 2021: Spoken language modelling
(3 minutes introduction)

Ewan Dunbar , Mathieu Bernard , France), Nicolas Hamilakis , France), Tu Anh Nguyen , France), Maureen de Seyssel , France), Patricia Rozé , France), Morgane Rivière , Eugene Kharitonov , Emmanuel Dupoux , France)

InterSpeech 2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing (3 minutes introduction)

Search in Audio

Related Recordings

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language (3 minutes introduction)

The Zero Resource Speech Challenge 2021: Spoken language modelling (3 minutes introduction)

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
(3 minutes introduction)

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language
(3 minutes introduction)

The Zero Resource Speech Challenge 2021: Spoken language modelling
(3 minutes introduction)