InterSpeech 2021

AusKidTalk: An Auditory-Visual Corpus of 3- to 12-year-old Australian Children’s Speech
(3 minutes introduction)

Beena Ahmed (UNSW Sydney, Australia), Kirrie J. Ballard (University of Sydney, Australia), Denis Burnham (Western Sydney University, Australia), Tharmakulasingam Sirojan (UNSW Sydney, Australia), Hadi Mehmood (UNSW Sydney, Australia), Dominique Estival (Western Sydney University, Australia), Elise Baker (Western Sydney University, Australia), Felicity Cox (Macquarie University, Australia), Joanne Arciuli (Flinders University, Australia), Titia Benders (Macquarie University, Australia), Katherine Demuth (Macquarie University, Australia), Barbara Kelly (University of Melbourne, Australia), Chloé Diskin-Holdaway (University of Mel bourne, Australia), Mostafa Shahin (UNSW Sydney, Australia), Vidhyasaharan Sethu (UNSW Sydney, Australia), Julien Epps (UNSW Sydney, Australia), Chwee Beng Lee (Western Sydney University, Australia), Eliathamby Ambikairajah (UNSW Sydney, Australia)
Here we present AusKidTalk [1], an audio-visual (AV) corpus of Australian children’s speech collected to facilitate the development of speech based technological solutions for children. It builds upon the technology and expertise developed through the collection of an earlier corpus of Australian adult speech, AusTalk [2,3]. This multi-site initiative was established to remedy the dire shortage of children’s speech corpora in Australia and around the world that are sufficiently sized to train accurate automated speech processing tools for children. We are collecting ~600 hours of speech from children aged 3–12 years that includes single word and sentence productions as well as narrative and emotional speech. In this paper, we discuss the key requirements for AusKidTalk and how we designed the recording setup and protocol to meet them. We also discuss key findings from our feasibility study of the recording protocol, recording tools, and user interface.