Odyssey 2020

The Speaker and Language Recognition Workshop

Odyssey 2020

Odyssey 2020: The Speaker and Language Recognition Workshop was hosted by NEC Corporation and Tokyo Institute of Technology in Tokyo, Japan, on Nov 02-05, 2020. The workshop is an ISCA tutorial and research workshop held in cooperation with the ISCA Speaker and Language Characterization special interest group. For the first time, Odyssey 2020 featured a tutorial day on Nov 01, 2020, before the Odyssey 2020 workshop. The tutorial day further strengthened the Odyssey 2020 as an ISCA Tutorial and Research Workshop (ITRW).

The need for fast, efficient, accurate, and robust means of recognizing people and languages is of growing importance for commercial, forensic, and government applications. The aim of this workshop is to continue to foster interactions among researchers in speaker and language recognition as the successor of previous successful events held in Martigny (1994), Avignon (1998), Crete (2001), Toledo (2004), San Juan (2006), Stellenbosch (2008), Brno (2010), Singapore (2012), Joensuu (2014), Bilbao (2016) and Les Sables d’Olonne (2018).

Website: http://www.odyssey2020.org


Keynotes

1:12:40

Towards Unsupervised Learning of Speech Representations

Dr. Mirco Ravanelli, Université de Montréal, Canada

0:51:36

The importance of Calibration in Speaker Verification

Luciana Ferrer, Computer Science Institute, Argentina


Live sessions


Tutorials

0:55:26

Anti-spoofing in automatic speaker recognition

Dr Massimiliano Todisco, Eurecom, France

1:38:44

End-to-end speaker recognition — why, when and how to do it?

Dr Johan Rohdin, Brno University of Technology, Czech Republic

1:01:15

Neural speech recognition

Dr Yotaro Kubo and Mr Shigeki Karita, Google Research, Japan


1:02:29

Neural statistical parametric speech synthesis

Dr Xin Wang, National Institute of Informatics, Japan


Speaker Recognition 1


0:21:33

Probabilistic Embeddings for Speaker Diarization

Anna Silnova, Niko Brummer, Johan Rohdin, Themos Stafylakis, Lukas Burget


Speaker and Language Recognition

0:19:39

Zero-Time Windowing Cepstral Coefficients for Dialect Classification

Rashmi Kethireddy, Sudarsana Reddy Kadiri, Santosh Kesiraju, Suryakanth V. Gangashetty

0:13:56

Compensation on x-vector for Short Utterance Spoken Language Identification

Peng Shen, Xugang Lu, Komei Sugiura, Sheng Li, Hisashi Kawai


0:11:52

Improving Embedding-based Neural-Network Speaker Recognition

Po-Chin Wang, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan, Shan-Wen Hsiao

0:19:57

Information Preservation Pooling for Speaker Embedding

Min Hyun Han, Woo Hyun Kang, Sung Hwan Mun, Nam Soo Kim

0:19:42

Neural i-vectors

Ville Vestman, Kong Aik Lee, Tomi Kinnunen


0:20:02

Denoising x-vectors for Robust Speaker Recognition

Mohammad Mohammadamini, Driss Matrouf, Paul-Gauthier Noé

0:24:51

Adaptive Mean Normalization for Unsupervised Adaptation of Speaker Embeddings

Mitchell Mclaren, Md Hafizur Rahman, Diego Castan, Mahesh Kumar Nandwana, Aaron Lawson



Diarization

0:16:30

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

Qingjian Lin, Weicheng Cai, Lin Yang, Junjie Wang, Jun Zhang, Ming Li

0:15:48

On Early-stop Clustering for Speaker Diarization

Liping Chen, Kongaik Lee, Lei He, Frank Soong


0:20:24

Linguistically Aided Speaker Diarization Using Speaker Role Information

Nikolaos Flemotomos, Panayiotis Georgiou, Shrikanth Narayanan

0:11:52

Optimal Mapping Loss: A Faster Loss for End-to-End Speaker Diarization

Qingjian Lin, Tingle Li, Lin Yang, Junjie Wang, Ming Li


Spoofing and Countermeasure 1

0:14:50

Generalization of Audio Deepfake Detection

Tianxiang Chen, Avrosh Kumar, Parav Nagarsheth, Ganesh Sivaraman, Elie Khoury



Special Session: VOiCES 2020

0:15:18

The VOiCES from a Distance Challenge 2019: Analysis of Speaker Verification Results and Remaining Challenges

Mahesh Kumar Nandwana, Michael Lomnitz, Colleen Richey, Mitchell McLaren, Diego Castan, Luciana Ferrer, Aaron Lawson

0:17:13

Selective Deep Speaker Embedding Enhancement for Speaker Verification

Jee-Weon Jung, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim, Ha-Jin Yu

0:17:14

Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

Aleksei Gusev, Vladimir Volokhov, Tseren Andzhukaev, Sergey Novoselov, Galina Lavrentyeva, Marina Volkova, Alice Gazizullina, Andrey Shulipa, Artem Gorlanov, Anastasia Avdeeva, Artem Ivanov, Alexander Kozlov, Timur Pekhovsky, Yuri Matveev


0:19:45

Utilizing VOiCES Dataset for Multichannel Speaker Verification with Beamforming

Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Jan Černocký

0:16:54

An Empirical Analysis of Information Encoded in Disentangled Neural Speaker Representations

Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan

0:15:19

NPLDA: A Deep Neural PLDA Model for Speaker Verification

Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy



Voice Conversion and Synthesis

0:19:30

Many-to-Many Voice Conversion Using Cycle-Consistent Variational Autoencoder with Multiple Decoders

Dongsuk Yook, Seong-Gyun Leem, Keonnyeong Lee, In-Chul Yoo


0:12:27

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

0:16:39

Personalized Singing Voice Generation Using WaveRNN

Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das, Haizhou Li



Evaluation and Benchmarking

0:19:43

The 2019 NIST Audio-Visual Speaker Recognition Evaluation

Omid Sadjadi, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero

0:18:53

The 2019 NIST Speaker Recognition Evaluation CTS Challenge

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero

0:20:00

Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19

Jesus Antonio Villalba Lopez, Daniel Garcia-Romero, Nanxin Chen, Gregory Sell, Jonas Borgstrom, Alan McCree, Leibny Paola Garcia Perera, Saurabh Kataria, Phani Sankar Nidadavolu, Pedro Torres-Carrasquiilo, Najim Dehak


0:14:28

LEAP System for SRE 2019 CTS Challenge - Improvements and Error Analysis

Shreyas Ramoji, Prashant Krishnan, Bhargavram Mysore, Prachi Singh, Sriram Ganapathy

0:24:31

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

Jahangir Alam, Gilles Boulianne, Lukas Burget, Mohamed Dahmane, Mireia Diez Sánchez, Alicia Lozano-Diez, Ondrej Glembek, Pierre-Luc St-Charles, Marc Lalonde, Pavel Matejka, Petr Mizera, Joao Monteiro, Ladislav Mosner, Cedric Noiseux, Ondřej Novotný, Oldrich Plchot, Johan Rohdin, Anna Silnova, Josef Slavicek, Themos Stafylakis, Shuai Wang, Hossein Zeinali


Spoofing and Countermeasure 2

0:26:28

Analysis of Teager Energy Profiles for Spoof Speech Detection

Madhu Kamble, Aditya Krishna Sai Pulikonda, Maddala Venkata Siva Krishna, Hemant Patil


0:14:31

Phase Spectrum of Time-flipped Speech Signals for Robust Spoofing Detection

Sung-Hyun Yoon, Min-Sung Koh, Ha-Jin Yu

0:20:12

Residual Networks for Resisting Noise: Analysis of an Embeddings-based Spoofing Countermeasure

Bence Halpern, Finnian Kelly, Rob van Son, Anil Alexander

0:17:44

An Explainability Study of the Constant Q Cepstral Coefficient Spoofing Countermeasure for Automatic Speaker Verification

Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas Evans, Massimiliano Todisco


0:18:20

Subband Modeling for Spoofing Detection in Automatic Speaker Verification

Bhusan Chettri, Tomi Kinnunen, Emmanouil Benetos


Speaker Recognition 2

0:13:17

Delving into VoxCeleb: Environment Invariant Speaker Recognition

Joon Son Chung, Jaesung Huh, Seongkyu Mun

0:14:58

Dropping Classes for Deep Speaker Representation Learning

Chau Luu, Peter Bell, Steve Renals

0:19:57

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng


0:18:38

Partial AUC Metric Learning Based Speaker Verification Back-End

Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen


Speech Application

0:20:01

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes

Sheng Li, Xugang Lu, Raj Dabre, Peng Shen, Hisashi Kawai

0:13:27

Small Footprint Multi-channel Keyword Spotting

Jilong Wu, Yiteng Huang, Hyun-Jin Park, Niranjan Subrahmanya, Patrick Violette


0:18:20

Speaker Detection in the Wild: Lessons Learned from JSALT 2019

Leibny Paola Garcia Perera, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak


0:15:05

Personal VAD: Speaker-Conditioned Voice Activity Detection

Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno


0:23:50

Speech Bandwidth Expansion For Speaker Recognition On Telephony Audio

Ganesh Sivaraman, Amruta Vidwans, Elie Khoury

0:20:02

Analysis of Deep Feature Loss Based Enhancement for Speaker Verification

Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Najim Dehak