InterSpeech 2021

Speech signal analysis and representation I

Estimating articulatory movements in speech production with transformer networks
(3 minutes introduction)

Sathvik Udupa (Indian Institute of Science, India), Anwesha Roy (Indian Institute of Science, India), Abhayjeet Singh (Indian Institute of Science, India), Aravind Illa (Amazon, India), Prasanta Kumar Ghosh (Indian Institute of Science, India)

Estimating articulatory movements in speech production with transformer networks
(longer introduction)

Sathvik Udupa (Indian Institute of Science, India), Anwesha Roy (Indian Institute of Science, India), Abhayjeet Singh (Indian Institute of Science, India), Aravind Illa (Amazon, India), Prasanta Kumar Ghosh (Indian Institute of Science, India)

Speech Decomposition based on a Hybrid Speech Model and Optimal Segmentation
(3 minutes introduction)

Alfredo Esquivel Jaramillo (Aalborg University, Denmark), Jesper Kjær Nielsen (Aalborg University, Denmark), Mads Græsbøll Christensen (Aalborg University, Denmark)

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
(3 minutes introduction)

Jian Luo (Ping An Technology, China), Jianzong Wang (Ping An Technology, China), Ning Cheng (Ping An Technology, China), Jing Xiao (Ping An Technology, China)

Noise robust pitch stylization using minimum mean absolute error criterion
(3 minutes introduction)

Chiranjeevi Yarra (IIIT Hyderabad, India), Prasanta Kumar Ghosh (Indian Institute of Science, India)

An Attribute-Aligned Strategy for Learning Speech Representation
(3 minutes introduction)

Yu-Lin Huang (National Tsing Hua University, Taiwan), Bo-Hao Su (National Tsing Hua University, Taiwan), Y.-W. Peter Hong (National Tsing Hua University, Taiwan), Chi-Chun Lee (National Tsing Hua University, Taiwan)

Raw Speech-to-Articulatory Inversion by Temporal Filtering and Decimation
(3 minutes introduction)

Abdolreza Sabzi Shahrebabaki (NTNU, Norway), Sabato Marco Siniscalchi (NTNU, Norway), Torbjørn Svendsen (NTNU, Norway)

Unsupervised Training of a DNN-based Formant Tracker
(3 minutes introduction)

Jason Lilley (Nemours, USA), H. Timothy Bunnell (Nemours, USA)

Unsupervised Training of a DNN-based Formant Tracker
(longer introduction)

Jason Lilley (Nemours, USA), H. Timothy Bunnell (Nemours, USA)

Synchronising speech segments with musical beats in Mandarin and English singing
(3 minutes introduction)

Cong Zhang (Radboud Universiteit, The Netherlands), Jian Zhu (University of Michigan, USA)