0:00:13and a causes b well i think we're interest and therefore above is the speaker
0:00:17recognition for telephone number of is one data
0:00:20usually my these submission form is design a
0:00:24this is a during war from distances on the human language than only standard estimators
0:00:29and standard orders
0:00:30it's and language processing from the two in my feeling
0:00:36these assigning the income tax
0:00:38c d is like we telephone speech intonational i mean
0:00:42and the audio visual like composed of them on the internet deviates from the bus
0:00:46core
0:00:47there is one that have a speaker recognition on face recognition only working model of
0:00:52this
0:00:53all also used and what database or some formula one and very well why don't
0:00:58you lda or cosine scoring okay
0:01:01in the other side the key points for women for still you what do you
0:01:05really
0:01:06not is gonna the nn are businessmen
0:01:10still be lots kind of a place the nn vectors
0:01:13also
0:01:14but cannot they still using in the melee but mostly from any estimator fine tuning
0:01:19to in domain data also
0:01:22one will be assigned the key points where usage of rain is that are based
0:01:28bodies
0:01:29okay we use cosine a score in several areas these to combine it is and
0:01:33variance from different be there
0:01:38again we use this on the i
0:01:40what i will face acoustic features that similar well for be overcome based detection
0:01:48problem that we just a speaker
0:01:50or face images
0:01:52and do not isn't based on but when there walking and we kind of in
0:01:56this course
0:02:00we start describing the oracle systems
0:02:04so we're starving different acoustic features we use and this is used for units vectors
0:02:09and build a lattice for rest of this vectors
0:02:12it's be we use community vad or sixty s and you don't and v for
0:02:17really
0:02:18in video we constantly system
0:02:23so what we'll from there is a sin was clustering or a be lda gmm
0:02:27that a single speaker factors in speaker labels posteriors
0:02:33we used to estimate of labels
0:02:35based on similar you know and as the best one double will make me but
0:02:39is not very sure would be is generally
0:02:42also on responsiveness might consider are less money is
0:02:48this and that was one based on god i
0:02:53we got some improvement will is but i during
0:02:55i seriously we're finding the for the n and then what we in domain data
0:03:00just finding the leslie using four letter words in this way embodies becomes a sinus
0:03:06or
0:03:07and we call this
0:03:10besides discriminant percent
0:03:15so we have seven that is that or architectures
0:03:19we have
0:03:21i was gonna be and then but since
0:03:23three five basis
0:03:25than better since what we're gonna since the is the same that we use of
0:03:29and sre
0:03:31the contains translators from new domain
0:03:35with a linear size of
0:03:36one thousand four
0:03:39alright is an utterance
0:03:42we unknown
0:03:45and therefore based on find a we have regulators five miles away two thousand forty
0:03:52eight
0:03:53are very agreements
0:03:56we also several possible ways that five questions
0:04:00they're having less than wireless the inverse there wasn't one the and that's always been
0:04:05feeding
0:04:08this is this one of the datasets used for training or not the inspectors
0:04:14so it's in serious condition
0:04:17zero use switchboard was designed for okay
0:04:21r c of this work
0:04:23it's
0:04:24there isn't or is something the in work we use all the data set someone
0:04:29one their completion
0:04:31a is evident in a we use the same but with the model
0:04:36we remove the so systems i one microphone
0:04:41lincoln labs the still use businesses
0:04:45microphone
0:04:48confrontation or this though
0:04:50we used as i e one
0:04:52and i'm gonna is this study
0:04:55and you state
0:04:57or they are all from being the one d c and we just use the
0:05:01most of the thing in this
0:05:06we have a for like principal equations
0:05:09c l is the only one last use the first configuration that's the line of
0:05:13the you're
0:05:16let's say that we have some all domain and some in domain
0:05:21first we and that the out-of-domain in domain using their or a little
0:05:26and they're all in an out-of-domain data in
0:05:30then we use a different thing that in for in domain
0:05:35although mean data
0:05:37we use common whitening
0:05:38then the my face
0:05:41the other two in domain data
0:05:45are then at the score normalization was and in domain data a calibrated
0:05:51but for steely and have a three by conventions
0:05:55something that and use for that all lda
0:05:58and the use yes everyday the lda for a swear and very nice thing what's
0:06:04almost instantly
0:06:05are also in the scoring or
0:06:09we also the lda for cases where
0:06:13and then it is then we only the model in salt
0:06:16or
0:06:20so this is a this what are the values something the markets
0:06:24that's a small difference between sites
0:06:27but as forces us ordinance yuri a on
0:06:32the use this study for then i x values to some well on this study
0:06:36in u one
0:06:38for the dc one
0:06:41as you use the is something at you by
0:06:45we also and since the only problem we also use the unlabeled
0:06:52that it really by doing clustering
0:06:56or other score normalization we use the only really
0:07:00i'm use the sre seen that for
0:07:05or maybe a we just think that can almost the latter
0:07:08this is a very good speakers in the white honestly demos data
0:07:13score by bayesian also us
0:07:16the i have to be also provided us an significant improvement
0:07:21a value will use this i think bias you one for calibration
0:07:29that's you know this used the silence
0:07:32first we analyze the us also that five million and
0:07:37romana something the we use
0:07:40where a source false or misleading there
0:07:44on the on the lower a sliding i b d one all the
0:07:49the base then system used unsupervised really in a bayesian with this study only
0:07:56then in the signal were we is that in the u one okay
0:08:01provides a very nice
0:08:04then we i we are noise segmentation lately
0:08:09that improves the convince your in the u
0:08:13then we have that the a spectrum and also
0:08:16and the in domain be i get some room and you the by a small
0:08:20improvement
0:08:21all in one
0:08:23i think that if we change that sure or then run your that's where we
0:08:28made the grade on our way we
0:08:31getting some
0:08:32implementing that you well limbaugh an improvement in the
0:08:39also analysis on this you by also versa before rest
0:08:44the bayesian network use a risk of a system for based silence mean versus evaluation
0:08:49will also must present a unique
0:08:54then we alignments unless something dusty the data
0:08:58provides a nice improvement in the u and it again
0:09:01then we a the we got a number of channels in the network and that
0:09:06provides a small role
0:09:09not remote really okay and we define the never will always unusable sinus fourteen
0:09:15so on without use of us more ergonomically baseline but in there about their grace
0:09:22and they always fits to the or something or thirteen data
0:09:30and that's was in those identity
0:09:35these are also all four to all the single system
0:09:40the based system is your five better results before was one of the database sinus
0:09:45ability have okay
0:09:48so we're very close to be easily affected formal system for which channels
0:09:51a personal one of the
0:09:56and
0:09:57for this part of the nn with the
0:10:00will be the training set
0:10:03in all cases you was greater than this method was i
0:10:12or we apply several
0:10:15medals for the fusion we have there
0:10:19but it's a you don't use of in it was used in calibration and yes
0:10:23is for a basis for
0:10:26an efficient v
0:10:28once you so in the real assisting calibration a one when you mean and another
0:10:33is that it is not the union that i mean and
0:10:37the scores
0:10:40a quality with a where we can see that is consistent when interviews with a
0:10:45very high or station
0:10:49are you sure we got everything we on over and over
0:10:55so the based system for us your proposal by in address the source for calibration
0:11:01i
0:11:03i think five series systems with but like plus three system is not possible
0:11:12or
0:11:13usually might need them
0:11:16we have the fusion of existence
0:11:20and the basic progress is a thing with fusion be but obviously once she
0:11:29the best results that they want you can see that are the system also
0:11:35the present problems phones your feature
0:11:42no it's either a your problem of your results
0:11:47was also an analysis of our last for the nn are where lunges it was
0:11:52also for delay of advanced
0:11:54or the u s
0:11:58the first figure analyze this problem i phase you're
0:12:02so and we can see that score normalization provides more meetings in a savvy the
0:12:07in domain sre an eighteen
0:12:09also we can see that i mean by handle this problem i faced is that
0:12:14why
0:12:15provide some a similar guy
0:12:18great
0:12:19the second year so the was also a v i
0:12:24right and that we will one between their usage
0:12:27so the decision rule
0:12:30the relative improvement in bic studies
0:12:32log in this i mean idea of illness i in
0:12:35so systems and it is easier to the utterance
0:12:40besides the results of the signal system that we used in all submissions
0:12:46we can see that there is anything about christmas is to have that is that
0:12:50e d u
0:12:52these is too small
0:12:54so you systems for the reestimation by a significant
0:12:59all by n c l is be part of the nn a waitress
0:13:03there is no right in assigning from using y for a given in a network
0:13:08for this
0:13:12we use a real efficient is the input shows the system for fusion
0:13:16we just reading writing i
0:13:19includes your we still is involved in an a small step
0:13:24so you're right value is yes one system
0:13:27you'd reminding contrast to estimate ubm
0:13:31the misuse
0:13:33have a very similar a million this year use women right i have the base
0:13:38a once you
0:13:42now see the face recognition systems
0:13:47this is there may be a front end
0:13:50the bible any something will be different for enrollment and test
0:13:53but elsewhere well
0:13:55phase of that still
0:13:57then enrollment
0:13:59we use the reference mumbles and you the test phase
0:14:04but overlap with the telephone calls
0:14:07in this will yes all the faces with it
0:14:11then we used the final
0:14:13modeling more on the original on a small line ungrounded phase and then we use
0:14:17that are facing varies
0:14:20we use briefly visited those and invariance
0:14:23you just be used every now and a snack implementations or within a face on
0:14:30our face unless you use the one d by the implementation
0:14:34we examine the task as a c n
0:14:39the video but since what are based on percent is for
0:14:43series system doesn't use score normalization for enrollment the average the enrollment and variance
0:14:50and the test set the new animated clustering with a twenty one clusters
0:14:56unless you listen we have several and robustness the
0:15:00but based methods also indicated in table we have
0:15:06you mean and variance
0:15:07averaged and variance the median of a multi clustering so turns you form an alliance
0:15:14you
0:15:15maybe also balanced young ones used for in somewhere in the media we go
0:15:20similar to his twitter that's they will i know fine inventing which is then weighted
0:15:25average
0:15:26all the meetings rooms
0:15:28in the total attention we obtain a single invading for this with a weighted average
0:15:33all the testing babies
0:15:37but also
0:15:38and enrollment set
0:15:41no see the this problem model
0:15:46we have analysis the csp markets for this experiment we used in save face first
0:15:51one hundred and very
0:15:52the best figure is without is not understand your is it is not
0:15:57is not improve the low in the guns you one and it's a need in
0:16:01the
0:16:01well rules less in this study night in
0:16:05you one
0:16:07and the baseline and in the about is the
0:16:10made in enrollment bonuses are limited clustering in the that is
0:16:15well as in the other datasets
0:16:17the baseline peons overall only once the contents of attention
0:16:22there are more steam or impostors are statistics
0:16:29we compare the different and variance improve work of the us you by the question
0:16:34and now there was as follows we have
0:16:38the questions all the inside phase
0:16:43printing models
0:16:45we use the whole or can we use a already some enrollment and omit the
0:16:51last three test
0:16:55area so we can see that the white gaussian is better than a form the
0:17:00exact reason but is there a lot of in the network a very significant we
0:17:05can see that doesn't work on my personal
0:17:14this of the submission process
0:17:17then used primarily
0:17:20is a really use general
0:17:22the only last three assumes be systems on the taste of is a well this
0:17:26year
0:17:27this using a system is close to the right
0:17:30using a system is worse a posteriori because we're and based on we were or
0:17:38generally but one best so that no one
0:17:42analysis
0:17:44against a
0:17:46based on the equal error rate
0:17:49well no that's impossible
0:17:56this was also than one model
0:17:59in addition
0:18:01so for the fusion we assume that independent within that we live video these so
0:18:06we assume this calls
0:18:08in the figure we have a combination of more than useful single all those used
0:18:13in
0:18:14the additional value systems
0:18:16single videos used in a fisherman previous nist and finally in one more
0:18:21we can see that
0:18:22we can get yours implement all eighty percent exactly
0:18:27when we will from a single of assistant
0:18:29who but it would be more efficient
0:18:35okay
0:18:36the key will results was using be data
0:18:40the no more than one the one used
0:18:44well cts less money loss
0:18:46probably provide some woman we're got significant improvement of that a spectrum that for some
0:18:51backends we
0:18:52small liberal in domain
0:18:55they can perform better than listening
0:18:58what a probability of the screen but it was saying performance where
0:19:02without the need for every
0:19:05the results difference between as i the n-best and instantly in obvious that we wonder
0:19:10why is that the is fitting work
0:19:14so it is also studied in it has led with the transform it is because
0:19:18the italians or entity that or
0:19:22i mean doesn't in there is no
0:19:25so we won't remember a city bus always focus on the same the on the
0:19:30other side exactly you have already body was also incredibly or we don't want to
0:19:35solve problem
0:19:37we're really on all levels
0:19:39i mean and variance
0:19:41and organs performing very well
0:19:43i mean it is obvious what is only obviously modalities are when
0:19:49in the unimodal this so we will maybe that's came are used
0:19:55that's all from my say thank you for