T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
|Markéta Řezáčková (University of West Bohemia, Czech Republic), Jan Švec (University of West Bohemia, Czech Republic), Daniel Tihelka (University of West Bohemia, Czech Republic)|
Despite the increasing popularity of end-to-end text-to-speech (TTS) systems, the correct grapheme-to-phoneme (G2P) module is still a crucial part of those relying on a phonetic input. In this paper, we, therefore, introduce a T5G2P model, a Text-to-Text Transfer Transformer (T5) neural network model which is able to convert an input text sentence into a phoneme sequence with a high accuracy. The evaluation of our trained T5 model is carried out on English and Czech, since there are different specific properties of G2P, including homograph disambiguation, cross-word assimilation and irregular pronunciation of loanwords. The paper also contains an analysis of a homographs issue in English and offers another approach to Czech phonetic transcription using the detection of pronunciation exceptions.