InterSpeech 2021

A neural network-based noise compensation method for pronunciation assessment
(3 minutes introduction)

Binghuai Lin (Tencent, China), Liyuan Wang (Tencent, China)
Automatic pronunciation assessment plays an important role in computer-assisted pronunciation training (CAPT). Goodness of pronunciation (GOP) based on automatic speech recognition (ASR) has been commonly used in pronunciation assessment. It has been found that GOP normally shows deteriorating performance under noisy conditions. Traditional noise compensation methods, which compensate distorted GOP under noisy situations based on the Gaussian mixture model (GMM) or other simple mapping functions, ignore contextual influence and phonemic attributes of the utterance. This usually leads to a lack of robustness with changed conditions. In this paper, we adopt a bidirectional long short-term (BLSTM) network combining phonemic attributes to conduct the compensation for distorted GOP under noisy conditions. We evaluate the model performance based on English words recorded by Chinese learners in clean and noisy situations. Experimental results show the proposed model outperforms the traditional baselines in Pearson correlation coefficient (PCC) and accuracy for pronunciation assessment under various noisy conditions.