InterSpeech 2021

ORCA-SLANG: An Automatic Multi-Stage Semi-Supervised Deep Learning Framework for Large-Scale Killer Whale Call Type Identification
(Oral presentation)

Christian Bergler (FAU Erlangen-Nürnberg, Germany), Manuel Schmitt (FAU Erlangen-Nürnberg, Germany), Andreas Maier (FAU Erlangen-Nürnberg, Germany), Helena Symonds (OrcaLab, Canada), Paul Spong (OrcaLab, Canada), Steven R. Ness (University of Victoria, Canada), George Tzanetakis (University of Victoria, Canada), Elmar Nöth (FAU Erlangen-Nürnberg, Germany)
Identification of animal-specific vocalization patterns is an imperative requirement to decode animal communication. In bioacoustics, passive acoustic recording setups are increasingly deployed to acquire large-scale datasets. Previous knowledge about established animal-specific call types is usually present due to historically conducted research. However, time- and human-resource constraints, combined with a lack of available machine-based approaches, only allow manual analysis of comparatively small data corpora and strongly distort the actual data representation and information value. Such data limitations cause restrictions in terms of identifying existing population-, group-, and individual-specific call types, sub-categories, as well as unseen vocalization patterns. Thus, machine learning forms the basis for animal-specific call type recognition, to facilitate more profound insights into communication. The current study is the first fusing task-specific neural networks to develop a fully automated, multi-stage, deep-learning-based framework, entitled ORCA-SLANG, performing semi-supervised call type identification in one of the largest animal-specific bioacoustic archives — the Orchive. Orca/noise segmentation, denoising, and subsequent feature learning provide robust representations for semi-supervised clustering/classification. This results in a machine-annotated call type data repository containing 235,369 unique calls.