A neural network-based noise compensation method for pronunciation assessment <BR>(3 minutes introduction)

A neural network-based noise compensation method for pronunciation assessment
(3 minutes introduction)

Binghuai Lin (Tencent, China), Liyuan Wang (Tencent, China)

Automatic pronunciation assessment plays an important role in computer-assisted pronunciation training (CAPT). Goodness of pronunciation (GOP) based on automatic speech recognition (ASR) has been commonly used in pronunciation assessment. It has been found that GOP normally shows deteriorating performance under noisy conditions. Traditional noise compensation methods, which compensate distorted GOP under noisy situations based on the Gaussian mixture model (GMM) or other simple mapping functions, ignore contextual influence and phonemic attributes of the utterance. This usually leads to a lack of robustness with changed conditions. In this paper, we adopt a bidirectional long short-term (BLSTM) network combining phonemic attributes to conduct the compensation for distorted GOP under noisy conditions. We evaluate the model performance based on English words recorded by Chinese learners in clean and noisy situations. Experimental results show the proposed model outperforms the traditional baselines in Pearson correlation coefficient (PCC) and accuracy for pronunciation assessment under various noisy conditions.

Search in Audio

Related Recordings

Acquisition of prosodic focus marking by three- to six-year-old children learning Mandarin Chinese
(3 minutes introduction)

Qianyutong Zhang , Kexin Lyu , Zening Chen , Ping Tang

A Preliminary Study on Discourse Prosody Encoding in L1 and L2 English Spontaneous Narratives
(3 minutes introduction)

Yuqing Zhang , Zhu Li , Binghuai Lin , Jinsong Zhang

InterSpeech 2021

A neural network-based noise compensation method for pronunciation assessment (3 minutes introduction)

Search in Audio

Related Recordings

Acquisition of prosodic focus marking by three- to six-year-old children learning Mandarin Chinese (3 minutes introduction)

A Preliminary Study on Discourse Prosody Encoding in L1 and L2 English Spontaneous Narratives (3 minutes introduction)

A neural network-based noise compensation method for pronunciation assessment
(3 minutes introduction)

Acquisition of prosodic focus marking by three- to six-year-old children learning Mandarin Chinese
(3 minutes introduction)

A Preliminary Study on Discourse Prosody Encoding in L1 and L2 English Spontaneous Narratives
(3 minutes introduction)