InterSpeech 2021

Anonymous speaker clusters: Making distinctions between anonymised speech recordings with clustering interface
(longer introduction)

Benjamin O’Brien (LPL (UMR 7309), France), Natalia Tomashenko (LIA (EA 4128), France), Anaïs Chanclu (LIA (EA 4128), France), Jean-François Bonastre (LIA (EA 4128), France)
Our study examined the performance of evaluators tasked to group natural and anonymised speech recordings into clusters based on their perceived similarities. Speech stimuli were selected from the VCTK corpus; two systems developed for the VoicePrivacy 2020 Challenge were used for anonymisation. The Baseline-1 (B1) system was developed by using x-vectors and neural waveform models, while the Baseline-2 (B2) system relied on digital-signal-processing techniques. 74 evaluators completed three trials composed of 16 recordings with either natural or anonymised speech generated from a single system. F-measure and cluster purity metrics were used to assess evaluator accuracy. Probabilistic linear discriminant analysis (PLDA) scores from an automatic speaker verification system were generated to quantify similarity between recordings and used to correlate subjective results. Our findings showed that non-native English speaking evaluators significantly lowered their F-measure means when presented anonymised recordings. We observed no significance for cluster purity. Pearson correlation procedures revealed that PLDA scores generated from natural and B2-anonymised speech recordings correlated positively to F-measure and cluster purity metrics. These findings show evaluators were able to use the interface to cluster natural and anonymised speech recordings and suggest anonymisation systems modelled like B1 are more effective at suppressing identifiable speech characteristics.