InterSpeech 2021

GlobalPhone Mix-to-Separate out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation
(Oral presentation)

Marvin Borsdorf (Universität Bremen, Germany), Chenglin Xu (NUS, Singapore), Haizhou Li (NUS, Singapore), Tanja Schultz (Universität Bremen, Germany)
Monaural speech separation has been well studied on various databases. However, these databases mostly concern English speech. Research in multi-speaker scenarios, such as speech recognition, speaker recognition, speaker diarization, and speech separation calls for speaker mixtures databases comprising multiple languages. In this paper, we propose a new extensive multilingual database for speech separation tasks derived from the GlobalPhone 2000 Speaker Package, called “GlobalPhone Mix-to-Separate out of 2” (GlobalPhoneMS2). We describe the construction of the database and conduct speech separation experiments in monolingual and multilingual as well as seen and unseen languages settings. When trained on a multilingual dataset, the networks improve their performances for unseen languages, and across almost all seen languages. We show that replacing a monolingual dataset with a trilingual one, while keeping the data size roughly the same, helps to improve the performance in most cases. We attribute this to a larger diversity in speech, language, speaker, and recording characteristics. Based on the GlobalPhoneMS2 database, speech separation results for two-speaker mixing scenarios are reported in 22 spoken languages for the first time.