0:00:15these work was the and by jeff's timing and mean
0:00:20and a scene two thousand two out of there is no i-vectors we really tried
0:00:25to put i-vectors what we don't didn't find where to put it
0:00:32we don't say that this is this state of the art however you can okay
0:00:42to do something new something and sometimes the best think it's to use something very
0:00:47old that everyone forgot about it
0:00:51so these work basically go back to nineteen fifty five
0:00:58where hmms were approximately define
0:01:03and
0:01:03at this time their work
0:01:06you pose define two types of hmms one that we well known
0:01:12we have a transitions
0:01:16from one state or to another and then we have a at each state it
0:01:22distribution the define the distribution of these data these
0:01:27state and you name
0:01:29more hmm and another type will find that
0:01:35it depends from where the data was defined so
0:01:39both transition probability and the distribution of the data was on the arcs
0:01:46not at this data and you named
0:01:49male hmm
0:01:51in the control system
0:01:54society they work a lot on both types of hmms
0:01:59but they were more than
0:02:01the don't try to estimate the parameters or make
0:02:05the best bus is viterbi
0:02:08the more work on
0:02:10discrete distribution and ask questions what is the equivalent model the that they can find
0:02:17or what is the minimum
0:02:20hmm the
0:02:21they can find
0:02:23we will look
0:02:25on the other perspective of the mainly hmm
0:02:29and compare it to more hmm the hmm we know
0:02:34and try to apply to
0:02:36there is a nation of telephone conversations
0:02:40so
0:02:42i would give a short summary of hmm just to build the same notation not
0:02:46to say something new
0:02:48and then they will present
0:02:50the mainly gmm
0:02:53and
0:02:54show how we applied to speaker there is variation
0:02:57and some results
0:02:59would be following
0:03:03so in the hmm we know that we have if we have k state
0:03:09model
0:03:11it defined by the
0:03:13initial probability vector or the transition matrix
0:03:17and
0:03:18the vector or of
0:03:22distributions in the
0:03:23gmm case
0:03:25each state distribution will be a
0:03:28gmm
0:03:29so the triple
0:03:32by a and b defines the
0:03:34model
0:03:38in more a gmm
0:03:40what we show
0:03:42there are going three problems to define the probability or the likelihood of the data
0:03:49of the model given the data
0:03:52of the v terribly problem to find the best path
0:03:56and
0:03:57to estimate the model
0:03:59we can estimated via a viterbi
0:04:03statistics
0:04:04or by baum-welch
0:04:07in our case in there is variation
0:04:09we interesting in viterbi statistics more
0:04:14the what tomato vacation to use mainly a gym and can be seen in these
0:04:19do example
0:04:22we can see that the
0:04:25suppose we are looking at state to
0:04:31there is
0:04:33few data which are i from state one to state to name and to one
0:04:39only two hundred points
0:04:42on the other hand from states three to state
0:04:46two
0:04:47derive much more data
0:04:50nine times more data
0:04:53the distribution of the data which arrive from each
0:04:57state
0:05:00different go science
0:05:04but if we will try to
0:05:06estimate its state to gmm of
0:05:10size of two
0:05:12it would basically low
0:05:15almost like the data k r i from state three
0:05:18in state
0:05:20two data will
0:05:21have very small influence
0:05:24all the distribution
0:05:29so
0:05:31we there we want to emphasise
0:05:35the data of this the which derived from state one
0:05:39so we can
0:05:42truancy is the to this the eight
0:05:46it's proper or when the data came from state one
0:05:53these are the distributions
0:05:56of the two gauche iain's
0:05:58and the above
0:06:04the
0:06:05from state one and from state to if we multiply each of them
0:06:12it the transition
0:06:15what
0:06:16we can see that
0:06:20we can not
0:06:22trying to see have any transition
0:06:26from state one to state to because the blue line is about
0:06:32and
0:06:34and of
0:06:35we always will decide to state to state it state one
0:06:40but
0:06:42if we are looking only on the data from transitions on the arcs
0:06:48on the specific data
0:06:50then we see
0:06:51it totally another features we see that it's much prefer able to move from state
0:06:58one to state to than to stay on state one which is the
0:07:04blue line
0:07:06so
0:07:07and i
0:07:08if we have
0:07:11it specific distribution on each arc and not on the state level
0:07:18we can
0:07:20better
0:07:22two
0:07:23move from one state to another
0:07:27then when we assume that
0:07:29the day in the state data
0:07:31is the same norm made therefore we each
0:07:34preview state we arrive
0:07:37so these was them the motivation to try to move
0:07:41from
0:07:43more a gmm two male hmm
0:07:48it in this case we define our model that we have
0:07:53in the initial vector or but that initial vector it's not
0:07:58effect or probabilities but in the a vector of pdfs for a distribution function
0:08:04it depends also
0:08:06on the data
0:08:08not only
0:08:09which data you are going to
0:08:12and we have a
0:08:14metrics any which is the matrix all again of function
0:08:20the dependence
0:08:21from which they to each datum transient
0:08:25transient and
0:08:26the data also it depends also on data so now we have a model which
0:08:32is
0:08:33a couple
0:08:34only of but i and eighty
0:08:41we have the same three problems like in more hmm to define the
0:08:46and likelihood of the model given the data
0:08:50to find the best path
0:08:52and to estimate
0:08:55or the parameters via
0:08:57viterbi statistics or baum-welch
0:09:01again baum-welch is not of the interest of these store we will
0:09:05don't just a little bit
0:09:07on these
0:09:09later
0:09:13so we can see
0:09:15if you want to estimate the likelihood
0:09:18it became very easy
0:09:21it just a
0:09:23product all of the
0:09:25initial vector are multiplied by mattered
0:09:28and then to sum it we multiply by
0:09:33vector a
0:09:34a row vector of ones
0:09:38if we compare it to for a gmm
0:09:40of course the
0:09:42we know we have to make
0:09:44apart over all the possible one
0:09:49pasties and we use the
0:09:51the forward or backward
0:09:54coefficients to do it but still the creation is much more complex
0:09:58then
0:10:00the
0:10:01matrix multiplication that we have in
0:10:04mainly representation
0:10:12to find the best viterbi by us
0:10:14it's also a known problem we have just to make these products
0:10:21all of the
0:10:23best transitions we have
0:10:29and we want to maximize
0:10:33are marks on the
0:10:34and a sequence of states really
0:10:37want to have
0:10:40i will briefly
0:10:43do you to
0:10:45because it's well now we have a at each time stamp effect or
0:10:51off
0:10:52best
0:10:55like to use of the sequence
0:10:57of a partial sequence and
0:11:00effect or of we are we derive from
0:11:04just as in more
0:11:06case
0:11:09we initialize
0:11:12the
0:11:13delta vector and three vector or
0:11:16very simply
0:11:20but in their a portion
0:11:22the equation became very simple much simpler than
0:11:27it wasn't more a gym and you just have them are probably product
0:11:32and mean
0:11:35be twice or
0:11:38between the vector of
0:11:41previous likely to and
0:11:43a row vector of much looks at
0:11:47we take much someone these product
0:11:50and they have they
0:11:53place where of the maximum likelihood of the path and argmax you've the previews
0:11:59place
0:12:00state where we came from
0:12:04and then like in more the gym and we have a termination
0:12:09and
0:12:10every cushion
0:12:11novel
0:12:12changes at all
0:12:20if you want to estimate
0:12:23the parameters using viterbi statistics
0:12:30which are the and i
0:12:32hence the cost function
0:12:37and
0:12:38the difference to the moral you
0:12:40in the level lagrange multiplier now we have a constrained
0:12:44not bella
0:12:47is some of
0:12:49the weights have to sum to one
0:12:52but
0:12:53the estimation to one
0:12:55have to be over all the weights
0:12:58from
0:13:00all the transition states from
0:13:02if you're going from state one we take the weights all the weights which are
0:13:07self loop to state one plus all the weights state to an states three
0:13:12and
0:13:13this is the only difference
0:13:18and
0:13:19at the end it converge to very simple recreation we just look
0:13:25it the data that runs eaten can maybe a train it gmm
0:13:30like
0:13:32we do in more but then we have
0:13:35scale the
0:13:36weights
0:13:37it at each gmm
0:13:40by
0:13:42this fraction
0:13:44everyone knows here what fraction is yes
0:13:53and this fraction it is actually the same s the transition probability in the more
0:14:01a gmm
0:14:03i was so we can see that on each are
0:14:06it's not a pdf
0:14:09but it pdf multiplied by
0:14:12a probability of the transition
0:14:22if you want to do bound where's we just i will not give the creation
0:14:26of there are big and ugly and
0:14:30there is no match information i just show that we have two
0:14:34defined a little bit differently the hidden variables we need
0:14:39the hidden variables on the
0:14:41for state on the initial state to define
0:14:46tk me
0:14:47one that if
0:14:48the m-th mixture
0:14:51i
0:14:52of the kinetic case initial state they meet
0:14:55do the e x one and similarly
0:14:58lee
0:14:59we define the hidden variable
0:15:02but any outdoor
0:15:04time which is not one
0:15:13then
0:15:15can rise the question is it really matter to you more a gmm or maybe
0:15:20a gmm
0:15:24yes and no
0:15:28yes we will see that it makes the life easier we will show shortly
0:15:33no because it was shown already that
0:15:39any
0:15:41more a gmm can be represented is mainly hmm and vice versa if any male
0:15:48hmm can be represented as more gmm
0:15:51so if we define
0:15:54a set of all possible sequences
0:15:58we give an example in the binary sequence that
0:16:03let's say all the value can be only zero and once so x star our
0:16:09all sequence possible sequences
0:16:13then the string probability p is
0:16:17and mapping from x start to zero one
0:16:22and we can define an equivalent model
0:16:27like
0:16:28to a two models
0:16:30which can be both hmm or both mainly one hmm one mainly
0:16:35more
0:16:36are defined the is equivalent
0:16:43for each be an ap prime
0:16:46we get the be equals be prime
0:16:56then we can define
0:17:00the more minimal model
0:17:02it's a model that
0:17:07it's in the equivalent model than has the look at the smallest number of states
0:17:15and the same is the mailing minimal a model
0:17:18we define the same if you have to mainly models that the mainly
0:17:23with the same
0:17:27be really we use the less number of states
0:17:32it's an open question still
0:17:35how to find the minimal model
0:17:39but
0:17:40the more interesting that we can show that for any case
0:17:45states
0:17:46more hmm we can find
0:17:49and the equivalent mainly hmm with the same number of states
0:17:54with no more than k states
0:17:57but
0:17:58vice versa it's not so easy for k
0:18:01states
0:18:03male hmm
0:18:05it can't happen that the
0:18:07minimal model for will be case square states so we increase
0:18:13in the power of to the number of states
0:18:17very easy to show how to move
0:18:21from
0:18:23more to maybe you just on the arcs put the probability of the pdfs of
0:18:29the state and multiplied by transition and we have an equivalent small
0:18:34but if you're going to male hmm
0:18:38we have to build
0:18:39it's structure that
0:18:42part of the transitions are zero and
0:18:46so a
0:18:48specify
0:18:49in the very precise way how to really
0:18:52and
0:18:53i'm not sure that this will be the minimal more model
0:18:57but they showed that this more than on more model would be equivalent to mainly
0:19:01model
0:19:02but we increase transition matrix and so on
0:19:06these in the case when we
0:19:08no the which state belong to which event
0:19:13if we don't know
0:19:14we will have to somehow estimated state
0:19:17it's one it
0:19:19s one to belong to event one and a to prevent to it's not
0:19:25very simple
0:19:27we applied to speaker there is station we have some voice activity detection overlapped speech
0:19:34removal of that the initialisation of hmm
0:19:38and then we
0:19:39apply fix duration
0:19:43a gmm
0:19:44clustering
0:19:46both for
0:19:48mainly and formal
0:19:50the minimum amount duration was of two hundred milliseconds it mean we stay twenty states
0:19:58in the same model
0:20:02so we have three hyper state for speaker one speaker to and non-speech because we
0:20:08know that this is telephone conversation we know in advance they are
0:20:12only
0:20:13two speakers
0:20:14in case of more hmm these is the picture which they tell in our case
0:20:20that we could twenty times in the same model then we can translate to any
0:20:26outdoor
0:20:28in male hmm it's very see similar
0:20:32but now with thing one model
0:20:35doll minus one ninety times in the same model
0:20:38and we have
0:20:40now on the transition our
0:20:44distributions
0:20:48the results were on
0:20:50ldc database one hundred and that eight conversations
0:20:55then approximately ten minute each
0:21:01and
0:21:03we tried different models
0:21:08therefore more those of twenty one and twenty four gaussian as a full covariance what
0:21:14better model gave
0:21:16best results
0:21:18over twenty four the results dropdown so we didn't show here
0:21:23and then we tried to
0:21:26a different
0:21:28models of mainly a gmm
0:21:31on the left side
0:21:33we see that a total number of gaussian is that we have
0:21:37in all the
0:21:39hmm
0:21:40and of the right side the diarization error rate
0:21:44and we can see basically that
0:21:47we have more gmms to estimate
0:21:50but we can achieve the same results as in
0:21:56more which a man
0:21:58we is twenty percent about twenty percent
0:22:02let's go oceans overall
0:22:04why because
0:22:06we enable
0:22:08to define it data on the transition
0:22:12because
0:22:13we cannot be sure that speaker one speaks after speaker at all
0:22:19have the same dynamics
0:22:21or i don't face like if you start speaking after silence
0:22:26maybe speaks differently and we want to
0:22:30defined these transition effects
0:22:33and we define them on the are
0:22:36probabilities
0:22:39so
0:22:41we can
0:22:43have the same results
0:22:45with less go shares or
0:22:48a little bit
0:22:49better results
0:22:51when we
0:22:52use more go options
0:22:55so we present maybe hmm
0:22:59show
0:23:01that you can works similarly
0:23:05the presentation between mainly and moral
0:23:09we see that the we can make telephone there is station
0:23:14without any loss of
0:23:17performance when we use mainly and even better performance with less complexity
0:23:30we know that hmm is usually and not always use it's a standalone
0:23:37the recession system
0:23:38but also
0:23:40when we use
0:23:41big based their station we have re fine tuning
0:23:45at the end which done by
0:23:49hmm
0:23:50we know that the in i-vector based there is station
0:23:54like poetry course the front view between the phase one and phase two there is
0:23:58a and an hmm that make re-segmentation we can replace the more hmm by maybe
0:24:05a gmm
0:24:07maybe
0:24:08get some improvement
0:24:12in the systems
0:24:14so
0:24:15this is my
0:24:17last thing that want to say
0:24:20thank you
0:24:40no you once
0:24:42question that where
0:24:45so
0:24:46in speaker diarisation usually we use gmms right
0:24:54which is then that well and you are using an ergodic hmm
0:24:59so it can you can on the advantage of using and nobody approach
0:25:07compared to the in there is a station we use the
0:25:12note gmms but like in
0:25:16in these the system may be an hmm the use of a good because you
0:25:21mail we
0:25:22assume that we can move from
0:25:24each speaker to each speaker and then there go tick way
0:25:28and the question about that is the state distribution deal now
0:25:32the state distribution as far as an over
0:25:35gmms
0:25:36and he relished industry also with gmms but on the arcs instead on right and
0:25:42the states
0:25:43but they stay with gmm
0:25:47okay
0:25:48but we have you are not using the notion of you know if the universal
0:25:52background models no i this is then we don't use because
0:25:59we work we several
0:26:04companies the that
0:26:07we tried to have data for universal background model and they say that they have
0:26:12no data in the
0:26:14the channels are changing very much and this a maybe they can give us one
0:26:19o one half hour
0:26:20and out of data
0:26:22and then not sure that we can do a very good the ubm model we
0:26:27use the
0:26:28one or even two hours of data
0:26:32so we use stand the long model that do not rely on a some background
0:26:38model but
0:26:40you've there is
0:26:42and background model that we
0:26:44we have a data so we can use extended hmm like and then is or
0:26:48based i-vector system and just encapsulate
0:26:52the gmms a part of it
0:26:54it's not a problem the next paper is on broadcast data so it may have
0:27:00more detailed and you