Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data <BR>(3 minutes introduction)

Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
(3 minutes introduction)

Ryo Tanji (Tokyo University of Science, Japan), Hidefumi Ohmura (Tokyo University of Science, Japan), Kouichi Katsurada (Tokyo University of Science, Japan)

We herein propose a deep neural network-based model for articulatory-to-acoustic conversion from real-time MRI data. Although rtMRI, which can record entire articulatory organs with a high resolution, has an advantage in articulatory-to-acoustic conversion, it has a relatively low sampling rate. To address this, we incorporated the super-resolution technique in the temporal dimension with a transposed convolution. With the use of transposed convolution, the resolution can be increased by applying the inversion process of resolution reduction of a standard CNN. To evaluate the performance on the datasets with different temporal resolutions, we conducted experiments using two datasets: USC-TIMIT and Japanese rtMRI dataset. Results of the experiments performed using mel-cepstrum distortion and PESQ showed that transposed convolution is effective for generating accurate acoustic features. We also confirmed that increasing the magnification of the super-resolution leads to an improvement in the PESQ score.

Loading player

Search in Audio

Related Recordings

Vocal-tract models to visualize the airstream of human breath and droplets while producing speech
(3 minutes introduction)

Takayuki Arai

Inhalations in speech: acoustic and physiological characteristics
(3 minutes introduction)

Raphael Werner , Susanne Fuchs , Jürgen Trouvain , Bernd Möbius

InterSpeech 2021

Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data (3 minutes introduction)

Search in Audio

Related Recordings

Vocal-tract models to visualize the airstream of human breath and droplets while producing speech (3 minutes introduction)

Inhalations in speech: acoustic and physiological characteristics (3 minutes introduction)

Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
(3 minutes introduction)

Vocal-tract models to visualize the airstream of human breath and droplets while producing speech
(3 minutes introduction)

Inhalations in speech: acoustic and physiological characteristics
(3 minutes introduction)