Discriminative Self-training for Punctuation Prediction <BR>(3 minutes introduction)

Discriminative Self-training for Punctuation Prediction
(3 minutes introduction)

Qian Chen (Alibaba, China), Wen Wang (Alibaba, China), Mengzhe Chen (Alibaba, China), Qinglin Zhang (Alibaba, China)

Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications. However, achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts, which is expensive and laborious. In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. Experimental results on the English IWSLT2011 benchmark test set and an internal Chinese spoken language dataset demonstrate that the proposed approach achieves significant improvement on punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models. The proposed Discriminative Self-Training approach outperforms the vanilla self-training approach. We establish a new state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current SOTA model by 1.3% absolute gain on F₁.

InterSpeech 2021

Discriminative Self-training for Punctuation Prediction
(3 minutes introduction)

Search in Audio

Related Recordings

Disfluency Detection with Unlabeled Data and Small BERT Models
(3 minutes introduction)

A noise robust method for word-level pronunciation assessment
(3 minutes introduction)

InterSpeech 2021

Discriminative Self-training for Punctuation Prediction (3 minutes introduction)

Search in Audio

Related Recordings

Disfluency Detection with Unlabeled Data and Small BERT Models (3 minutes introduction)

A noise robust method for word-level pronunciation assessment (3 minutes introduction)

Discriminative Self-training for Punctuation Prediction
(3 minutes introduction)

Disfluency Detection with Unlabeled Data and Small BERT Models
(3 minutes introduction)

A noise robust method for word-level pronunciation assessment
(3 minutes introduction)