Cisco's Speaker Segmentation and Recognition System

Presented by:

Sachin Kajarekar

Author(s):

Sachin Kajarekar, Aparna Khare, Matthias Paulik, Neha Agrawal, Panchi Panchapagesan, Ananth Sankar and Satish Gannu

This paper presents Cisco's speaker segmentation and recognition (SSR) system, which is a part of a commercial product. Cisco SSR uses speaker segmentation and speaker recognition algorithms with a crowd sourcing approach to create speaker metadata. The speaker metadata makes the enterprise videos more accessible and more navigable by itself, and by its combination with other forms of metadata such as keywords. This paper illustrates various functional blocks of SSR and a typical user interface. The paper describes the specific implementations of speaker segmentation and recognition algorithms. The paper also describes the evaluation data and protocols plus results for both speaker segmentation and speaker recognition tasks. Speaker segmentation results show that Cisco SSR performs comparable to the state-of-the-art on RT-03F data. Speaker recognition results show that a small set of user provided labels can be effectively transferred to a continuously expanding set of videos.

Odyssey 2012

The Speaker and Language Recognition Workshop

Cisco's Speaker Segmentation and Recognition System

Search in Audio

Related Recordings

Generalized Viterbi-based Models for Time-Series Segmentation Applied to Speaker Diarization

A Global Optimization Framework For Speaker Diarization