Speaker-basis Accent Clustering Using Invariant Structure Analysis and the Speech Accent Archive

Nobuaki Minematsu, Shun Kasahara, Takehiko Makino, Daisuke Saito and Keikichi Hirose

English is the only language available for global communication and is used by 1.5 billions of speakers. It is also known to have a large diversity of pronunciation due to the influence of speakers’ mother tongue, called accents. Our project aims at creating a global and speaker-basis map of English accents to be used in learning World Englishes as well as research studies of World Englishes [1, 2]. Creating the map, i.e., speaker-basis accent clustering, mathematically requires a distance matrix in terms of accents among all the speakers considered, and technically requires a method of predicting the accent distance between any pair of the speakers by using their speech samples only. In [3, 4], our first trials were presented, where invariant structure analysis was effectively used for feature extraction. However, some technical problems were found through the experiments and in this paper, recent progresses are presented with additional explanation on the invariant structure, which were omitted in [3, 4] due to space limitations. Use of the invariant structure and Support Vector Regression shows a striking performance of distance prediction in a speaker-pair-open mode but the performance is not sufficient in a speaker-open mode.

Odyssey 2014

The Speaker and Language Recognition Workshop

Speaker-basis Accent Clustering Using Invariant Structure Analysis and the Speech Accent Archive

Search in Audio

Speech Transcript

Related Recordings

Robust Language Recognition Based on Diverse Features

Multiclass Discriminative Training of i-vector Language Recognition