Telephone Conversation Speaker Diarization Using Mealy-HMMs

Itshak Lapidot, Jean-Francois Bonastre and Samy Bengio

When Hidden Markov Models (HMMs) were first introduced, two competing representation models were proposed, the Moore model, with separate emission and transition distributions, which is commonly used in speech technologies, and the Mealy model, with a single emission-transition distribution. Since then the literature has mostly focused on the Moore model. In this paper, we would like to show the use of Mealy-HMMs for telephone conversation speaker diarization task. We present the Viterbi training and decoding for Mealy-HMMs and show that it yields similar performance compared to Moore-HMMs with a fewer number of parameters.

Odyssey 2014

The Speaker and Language Recognition Workshop

Telephone Conversation Speaker Diarization Using Mealy-HMMs

Search in Audio

Speech Transcript

Related Recordings

Person Instance Graphs for Named Speaker Identification in TV Broadcast

Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization