|Binghuai Lin (Tencent, China), Liyuan Wang (Tencent, China)|
The common approach for pronunciation evaluation is based on Goodness of pronunciation (GOP). It has been found that GOP may perform worse under noise conditions. Traditional methods compensate pronunciation features to improve the performance of pronunciation assessment in noise situations. This paper proposed a noise robust model for word-level pronunciation assessment based on a domain adversarial training (DAT) method. We treat the pronunciation assessment in the clean and noise situations as the source and target domains. The network is optimized by incorporating both the pronunciation assessment and noise domain discrimination. The domain labels are generated from unsupervised methods to adapt to various noise situations. We evaluate the model performance based on English words recorded by Chinese English learners and labeled by three experts. Experimental results show on average the proposed model outperforms the baseline by 3% in Pearson correlation coefficients (PCC) and 4% in accuracy under different noise conditions.