T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion <BR>(Oral presentation)

T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
(Oral presentation)

Markéta Řezáčková (University of West Bohemia, Czech Republic), Jan Švec (University of West Bohemia, Czech Republic), Daniel Tihelka (University of West Bohemia, Czech Republic)

Despite the increasing popularity of end-to-end text-to-speech (TTS) systems, the correct grapheme-to-phoneme (G2P) module is still a crucial part of those relying on a phonetic input. In this paper, we, therefore, introduce a T5G2P model, a Text-to-Text Transfer Transformer (T5) neural network model which is able to convert an input text sentence into a phoneme sequence with a high accuracy. The evaluation of our trained T5 model is carried out on English and Czech, since there are different specific properties of G2P, including homograph disambiguation, cross-word assimilation and irregular pronunciation of loanwords. The paper also contains an analysis of a homographs issue in English and offers another approach to Czech phonetic transcription using the detection of pronunciation exceptions.

InterSpeech 2021

T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
(Oral presentation)

Search in Audio

Related Recordings

Conversion of airborne to bone-conducted speech with deep neural networks
(Oral presentation)

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values
(Oral presentation)

InterSpeech 2021

T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion (Oral presentation)

Search in Audio

Related Recordings

Conversion of airborne to bone-conducted speech with deep neural networks (Oral presentation)

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values (Oral presentation)

T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
(Oral presentation)

Conversion of airborne to bone-conducted speech with deep neural networks
(Oral presentation)

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values
(Oral presentation)