InterSpeech 2021

Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings
(3 minutes introduction)

Shintaro Ando (University of Tokyo, Japan), Nobuaki Minematsu (University of Tokyo, Japan), Daisuke Saito (University of Tokyo, Japan)
In L2 pronunciation, what kind of phonetic errors are more influential to intelligibility reduction? Teachers say that learners’ utterances become unintelligible when words are pronounced with such errors that make the words misidentified as others. In this paper, we focus on Japanese English (JE), where the number of phonemes of the L1 (Japanese) is much smaller than that of the L2 (American English, AE). Since learners often substitute L1 phonemes when speaking in L2, some words are expected to be pronounced not distinctively enough in JE, which may result in word misidentification. This implies that words of JE will exist phonetically closer to each other in a space where words are distributed. In this paper, lexical density analysis of JE and AE is carried out using acoustic word embeddings. Word productions in JE and AE, extracted from the ERJ corpus, are mapped as points in an acoustic word embedding space obtained by network training with the WSJ corpus. Experiments show that significantly higher density is found in JE than in AE and it is also found in poor learners than in good learners.