InterSpeech 2021

Lexical Modeling of ASR Errors for Robust Speech Translation
(longer introduction)

Giuseppe Martucci (Università di Trento, Italy), Mauro Cettolo (FBK, Italy), Matteo Negri (FBK, Italy), Marco Turchi (FBK, Italy)
Error propagation from automatic speech recognition (ASR) to machine translation (MT) is a critical issue for the (still) dominant cascade approach to speech translation. To robustify MT to ill-formed inputs, we propose a technique to artificially corrupt clean transcripts so as to emulate noisy automatic transcripts. Our Lexical Noise model relies on estimating from ASR data: i) the probability distribution of the possible edit operations applicable to each word, and ii) the probability distribution of possible lexical substitutes for that word. Corrupted data generated from these probabilities are paired with their original clean counterpart for MT adaptation via fine-tuning. Contrastive experiments on three language pairs led to three main findings. First, on noisy transcripts, the adapted models outperform MT systems fine-tuned on synthetic data corrupted with previous noising techniques, approaching the upper bound performance obtained by fine-tuning on real ASR data. Second, the increased robustness does not come at the cost of performance drops on clean test data. Third, and crucial from the application standpoint, our approach is domain/ASR-independent: noising patterns learned from a given ASR system in a certain domain can be successfully applied to robustify MT to errors made by other ASR systems in a different domain.