Odyssey 2012

The Speaker and Language Recognition Workshop

Generalized Viterbi-based Models for Time-Series Segmentation Applied to Speaker Diarization

Presented by:
Itshak Lapidot
Itshak Lapidot and Jean-Francois Bonastre

Time-series clustering is a process which takes into account the input samples chronological sequence. So, in time-series clustering, the samples are not processed independently as a result for a given sample depends on the clustering result of the whole sequence. One of the popular clustering algorithms to handle such dependency is the well-known Hidden-Markov-Model (HMM) trained by the Viterbi statistics. In this work we propose a generalization of the broadly used HMM, denoted Hidden-Distortion-Models (HDMs). Our proposal is based on distortion-based models and transition count, for which probabilistic calculations are no longer mandatory. We will introduce our approach by its mathematical bases. It will be shown that Viterbi based HMM can be seen as a special case of HDM. This proximity allows to us to apply similar approaches for state-model training when the new paradigm is used to learn the sequence dependencies. Speaker diarization application will be presented to show the advantages of the HDM as a clustering algorithm.