0:00:13 | and a causes b well i think we're interest and therefore above is the speaker |
---|
0:00:17 | recognition for telephone number of is one data |
---|
0:00:20 | usually my these submission form is design a |
---|
0:00:24 | this is a during war from distances on the human language than only standard estimators |
---|
0:00:29 | and standard orders |
---|
0:00:30 | it's and language processing from the two in my feeling |
---|
0:00:36 | these assigning the income tax |
---|
0:00:38 | c d is like we telephone speech intonational i mean |
---|
0:00:42 | and the audio visual like composed of them on the internet deviates from the bus |
---|
0:00:46 | core |
---|
0:00:47 | there is one that have a speaker recognition on face recognition only working model of |
---|
0:00:52 | this |
---|
0:00:53 | all also used and what database or some formula one and very well why don't |
---|
0:00:58 | you lda or cosine scoring okay |
---|
0:01:01 | in the other side the key points for women for still you what do you |
---|
0:01:05 | really |
---|
0:01:06 | not is gonna the nn are businessmen |
---|
0:01:10 | still be lots kind of a place the nn vectors |
---|
0:01:13 | also |
---|
0:01:14 | but cannot they still using in the melee but mostly from any estimator fine tuning |
---|
0:01:19 | to in domain data also |
---|
0:01:22 | one will be assigned the key points where usage of rain is that are based |
---|
0:01:28 | bodies |
---|
0:01:29 | okay we use cosine a score in several areas these to combine it is and |
---|
0:01:33 | variance from different be there |
---|
0:01:38 | again we use this on the i |
---|
0:01:40 | what i will face acoustic features that similar well for be overcome based detection |
---|
0:01:48 | problem that we just a speaker |
---|
0:01:50 | or face images |
---|
0:01:52 | and do not isn't based on but when there walking and we kind of in |
---|
0:01:56 | this course |
---|
0:02:00 | we start describing the oracle systems |
---|
0:02:04 | so we're starving different acoustic features we use and this is used for units vectors |
---|
0:02:09 | and build a lattice for rest of this vectors |
---|
0:02:12 | it's be we use community vad or sixty s and you don't and v for |
---|
0:02:17 | really |
---|
0:02:18 | in video we constantly system |
---|
0:02:23 | so what we'll from there is a sin was clustering or a be lda gmm |
---|
0:02:27 | that a single speaker factors in speaker labels posteriors |
---|
0:02:33 | we used to estimate of labels |
---|
0:02:35 | based on similar you know and as the best one double will make me but |
---|
0:02:39 | is not very sure would be is generally |
---|
0:02:42 | also on responsiveness might consider are less money is |
---|
0:02:48 | this and that was one based on god i |
---|
0:02:53 | we got some improvement will is but i during |
---|
0:02:55 | i seriously we're finding the for the n and then what we in domain data |
---|
0:03:00 | just finding the leslie using four letter words in this way embodies becomes a sinus |
---|
0:03:06 | or |
---|
0:03:07 | and we call this |
---|
0:03:10 | besides discriminant percent |
---|
0:03:15 | so we have seven that is that or architectures |
---|
0:03:19 | we have |
---|
0:03:21 | i was gonna be and then but since |
---|
0:03:23 | three five basis |
---|
0:03:25 | than better since what we're gonna since the is the same that we use of |
---|
0:03:29 | and sre |
---|
0:03:31 | the contains translators from new domain |
---|
0:03:35 | with a linear size of |
---|
0:03:36 | one thousand four |
---|
0:03:39 | alright is an utterance |
---|
0:03:42 | we unknown |
---|
0:03:45 | and therefore based on find a we have regulators five miles away two thousand forty |
---|
0:03:52 | eight |
---|
0:03:53 | are very agreements |
---|
0:03:56 | we also several possible ways that five questions |
---|
0:04:00 | they're having less than wireless the inverse there wasn't one the and that's always been |
---|
0:04:05 | feeding |
---|
0:04:08 | this is this one of the datasets used for training or not the inspectors |
---|
0:04:14 | so it's in serious condition |
---|
0:04:17 | zero use switchboard was designed for okay |
---|
0:04:21 | r c of this work |
---|
0:04:23 | it's |
---|
0:04:24 | there isn't or is something the in work we use all the data set someone |
---|
0:04:29 | one their completion |
---|
0:04:31 | a is evident in a we use the same but with the model |
---|
0:04:36 | we remove the so systems i one microphone |
---|
0:04:41 | lincoln labs the still use businesses |
---|
0:04:45 | microphone |
---|
0:04:48 | confrontation or this though |
---|
0:04:50 | we used as i e one |
---|
0:04:52 | and i'm gonna is this study |
---|
0:04:55 | and you state |
---|
0:04:57 | or they are all from being the one d c and we just use the |
---|
0:05:01 | most of the thing in this |
---|
0:05:06 | we have a for like principal equations |
---|
0:05:09 | c l is the only one last use the first configuration that's the line of |
---|
0:05:13 | the you're |
---|
0:05:16 | let's say that we have some all domain and some in domain |
---|
0:05:21 | first we and that the out-of-domain in domain using their or a little |
---|
0:05:26 | and they're all in an out-of-domain data in |
---|
0:05:30 | then we use a different thing that in for in domain |
---|
0:05:35 | although mean data |
---|
0:05:37 | we use common whitening |
---|
0:05:38 | then the my face |
---|
0:05:41 | the other two in domain data |
---|
0:05:45 | are then at the score normalization was and in domain data a calibrated |
---|
0:05:51 | but for steely and have a three by conventions |
---|
0:05:55 | something that and use for that all lda |
---|
0:05:58 | and the use yes everyday the lda for a swear and very nice thing what's |
---|
0:06:04 | almost instantly |
---|
0:06:05 | are also in the scoring or |
---|
0:06:09 | we also the lda for cases where |
---|
0:06:13 | and then it is then we only the model in salt |
---|
0:06:16 | or |
---|
0:06:20 | so this is a this what are the values something the markets |
---|
0:06:24 | that's a small difference between sites |
---|
0:06:27 | but as forces us ordinance yuri a on |
---|
0:06:32 | the use this study for then i x values to some well on this study |
---|
0:06:36 | in u one |
---|
0:06:38 | for the dc one |
---|
0:06:41 | as you use the is something at you by |
---|
0:06:45 | we also and since the only problem we also use the unlabeled |
---|
0:06:52 | that it really by doing clustering |
---|
0:06:56 | or other score normalization we use the only really |
---|
0:07:00 | i'm use the sre seen that for |
---|
0:07:05 | or maybe a we just think that can almost the latter |
---|
0:07:08 | this is a very good speakers in the white honestly demos data |
---|
0:07:13 | score by bayesian also us |
---|
0:07:16 | the i have to be also provided us an significant improvement |
---|
0:07:21 | a value will use this i think bias you one for calibration |
---|
0:07:29 | that's you know this used the silence |
---|
0:07:32 | first we analyze the us also that five million and |
---|
0:07:37 | romana something the we use |
---|
0:07:40 | where a source false or misleading there |
---|
0:07:44 | on the on the lower a sliding i b d one all the |
---|
0:07:49 | the base then system used unsupervised really in a bayesian with this study only |
---|
0:07:56 | then in the signal were we is that in the u one okay |
---|
0:08:01 | provides a very nice |
---|
0:08:04 | then we i we are noise segmentation lately |
---|
0:08:09 | that improves the convince your in the u |
---|
0:08:13 | then we have that the a spectrum and also |
---|
0:08:16 | and the in domain be i get some room and you the by a small |
---|
0:08:20 | improvement |
---|
0:08:21 | all in one |
---|
0:08:23 | i think that if we change that sure or then run your that's where we |
---|
0:08:28 | made the grade on our way we |
---|
0:08:31 | getting some |
---|
0:08:32 | implementing that you well limbaugh an improvement in the |
---|
0:08:39 | also analysis on this you by also versa before rest |
---|
0:08:44 | the bayesian network use a risk of a system for based silence mean versus evaluation |
---|
0:08:49 | will also must present a unique |
---|
0:08:54 | then we alignments unless something dusty the data |
---|
0:08:58 | provides a nice improvement in the u and it again |
---|
0:09:01 | then we a the we got a number of channels in the network and that |
---|
0:09:06 | provides a small role |
---|
0:09:09 | not remote really okay and we define the never will always unusable sinus fourteen |
---|
0:09:15 | so on without use of us more ergonomically baseline but in there about their grace |
---|
0:09:22 | and they always fits to the or something or thirteen data |
---|
0:09:30 | and that's was in those identity |
---|
0:09:35 | these are also all four to all the single system |
---|
0:09:40 | the based system is your five better results before was one of the database sinus |
---|
0:09:45 | ability have okay |
---|
0:09:48 | so we're very close to be easily affected formal system for which channels |
---|
0:09:51 | a personal one of the |
---|
0:09:56 | and |
---|
0:09:57 | for this part of the nn with the |
---|
0:10:00 | will be the training set |
---|
0:10:03 | in all cases you was greater than this method was i |
---|
0:10:12 | or we apply several |
---|
0:10:15 | medals for the fusion we have there |
---|
0:10:19 | but it's a you don't use of in it was used in calibration and yes |
---|
0:10:23 | is for a basis for |
---|
0:10:26 | an efficient v |
---|
0:10:28 | once you so in the real assisting calibration a one when you mean and another |
---|
0:10:33 | is that it is not the union that i mean and |
---|
0:10:37 | the scores |
---|
0:10:40 | a quality with a where we can see that is consistent when interviews with a |
---|
0:10:45 | very high or station |
---|
0:10:49 | are you sure we got everything we on over and over |
---|
0:10:55 | so the based system for us your proposal by in address the source for calibration |
---|
0:11:01 | i |
---|
0:11:03 | i think five series systems with but like plus three system is not possible |
---|
0:11:12 | or |
---|
0:11:13 | usually might need them |
---|
0:11:16 | we have the fusion of existence |
---|
0:11:20 | and the basic progress is a thing with fusion be but obviously once she |
---|
0:11:29 | the best results that they want you can see that are the system also |
---|
0:11:35 | the present problems phones your feature |
---|
0:11:42 | no it's either a your problem of your results |
---|
0:11:47 | was also an analysis of our last for the nn are where lunges it was |
---|
0:11:52 | also for delay of advanced |
---|
0:11:54 | or the u s |
---|
0:11:58 | the first figure analyze this problem i phase you're |
---|
0:12:02 | so and we can see that score normalization provides more meetings in a savvy the |
---|
0:12:07 | in domain sre an eighteen |
---|
0:12:09 | also we can see that i mean by handle this problem i faced is that |
---|
0:12:14 | why |
---|
0:12:15 | provide some a similar guy |
---|
0:12:18 | great |
---|
0:12:19 | the second year so the was also a v i |
---|
0:12:24 | right and that we will one between their usage |
---|
0:12:27 | so the decision rule |
---|
0:12:30 | the relative improvement in bic studies |
---|
0:12:32 | log in this i mean idea of illness i in |
---|
0:12:35 | so systems and it is easier to the utterance |
---|
0:12:40 | besides the results of the signal system that we used in all submissions |
---|
0:12:46 | we can see that there is anything about christmas is to have that is that |
---|
0:12:50 | e d u |
---|
0:12:52 | these is too small |
---|
0:12:54 | so you systems for the reestimation by a significant |
---|
0:12:59 | all by n c l is be part of the nn a waitress |
---|
0:13:03 | there is no right in assigning from using y for a given in a network |
---|
0:13:08 | for this |
---|
0:13:12 | we use a real efficient is the input shows the system for fusion |
---|
0:13:16 | we just reading writing i |
---|
0:13:19 | includes your we still is involved in an a small step |
---|
0:13:24 | so you're right value is yes one system |
---|
0:13:27 | you'd reminding contrast to estimate ubm |
---|
0:13:31 | the misuse |
---|
0:13:33 | have a very similar a million this year use women right i have the base |
---|
0:13:38 | a once you |
---|
0:13:42 | now see the face recognition systems |
---|
0:13:47 | this is there may be a front end |
---|
0:13:50 | the bible any something will be different for enrollment and test |
---|
0:13:53 | but elsewhere well |
---|
0:13:55 | phase of that still |
---|
0:13:57 | then enrollment |
---|
0:13:59 | we use the reference mumbles and you the test phase |
---|
0:14:04 | but overlap with the telephone calls |
---|
0:14:07 | in this will yes all the faces with it |
---|
0:14:11 | then we used the final |
---|
0:14:13 | modeling more on the original on a small line ungrounded phase and then we use |
---|
0:14:17 | that are facing varies |
---|
0:14:20 | we use briefly visited those and invariance |
---|
0:14:23 | you just be used every now and a snack implementations or within a face on |
---|
0:14:30 | our face unless you use the one d by the implementation |
---|
0:14:34 | we examine the task as a c n |
---|
0:14:39 | the video but since what are based on percent is for |
---|
0:14:43 | series system doesn't use score normalization for enrollment the average the enrollment and variance |
---|
0:14:50 | and the test set the new animated clustering with a twenty one clusters |
---|
0:14:56 | unless you listen we have several and robustness the |
---|
0:15:00 | but based methods also indicated in table we have |
---|
0:15:06 | you mean and variance |
---|
0:15:07 | averaged and variance the median of a multi clustering so turns you form an alliance |
---|
0:15:14 | you |
---|
0:15:15 | maybe also balanced young ones used for in somewhere in the media we go |
---|
0:15:20 | similar to his twitter that's they will i know fine inventing which is then weighted |
---|
0:15:25 | average |
---|
0:15:26 | all the meetings rooms |
---|
0:15:28 | in the total attention we obtain a single invading for this with a weighted average |
---|
0:15:33 | all the testing babies |
---|
0:15:37 | but also |
---|
0:15:38 | and enrollment set |
---|
0:15:41 | no see the this problem model |
---|
0:15:46 | we have analysis the csp markets for this experiment we used in save face first |
---|
0:15:51 | one hundred and very |
---|
0:15:52 | the best figure is without is not understand your is it is not |
---|
0:15:57 | is not improve the low in the guns you one and it's a need in |
---|
0:16:01 | the |
---|
0:16:01 | well rules less in this study night in |
---|
0:16:05 | you one |
---|
0:16:07 | and the baseline and in the about is the |
---|
0:16:10 | made in enrollment bonuses are limited clustering in the that is |
---|
0:16:15 | well as in the other datasets |
---|
0:16:17 | the baseline peons overall only once the contents of attention |
---|
0:16:22 | there are more steam or impostors are statistics |
---|
0:16:29 | we compare the different and variance improve work of the us you by the question |
---|
0:16:34 | and now there was as follows we have |
---|
0:16:38 | the questions all the inside phase |
---|
0:16:43 | printing models |
---|
0:16:45 | we use the whole or can we use a already some enrollment and omit the |
---|
0:16:51 | last three test |
---|
0:16:55 | area so we can see that the white gaussian is better than a form the |
---|
0:17:00 | exact reason but is there a lot of in the network a very significant we |
---|
0:17:05 | can see that doesn't work on my personal |
---|
0:17:14 | this of the submission process |
---|
0:17:17 | then used primarily |
---|
0:17:20 | is a really use general |
---|
0:17:22 | the only last three assumes be systems on the taste of is a well this |
---|
0:17:26 | year |
---|
0:17:27 | this using a system is close to the right |
---|
0:17:30 | using a system is worse a posteriori because we're and based on we were or |
---|
0:17:38 | generally but one best so that no one |
---|
0:17:42 | analysis |
---|
0:17:44 | against a |
---|
0:17:46 | based on the equal error rate |
---|
0:17:49 | well no that's impossible |
---|
0:17:56 | this was also than one model |
---|
0:17:59 | in addition |
---|
0:18:01 | so for the fusion we assume that independent within that we live video these so |
---|
0:18:06 | we assume this calls |
---|
0:18:08 | in the figure we have a combination of more than useful single all those used |
---|
0:18:13 | in |
---|
0:18:14 | the additional value systems |
---|
0:18:16 | single videos used in a fisherman previous nist and finally in one more |
---|
0:18:21 | we can see that |
---|
0:18:22 | we can get yours implement all eighty percent exactly |
---|
0:18:27 | when we will from a single of assistant |
---|
0:18:29 | who but it would be more efficient |
---|
0:18:35 | okay |
---|
0:18:36 | the key will results was using be data |
---|
0:18:40 | the no more than one the one used |
---|
0:18:44 | well cts less money loss |
---|
0:18:46 | probably provide some woman we're got significant improvement of that a spectrum that for some |
---|
0:18:51 | backends we |
---|
0:18:52 | small liberal in domain |
---|
0:18:55 | they can perform better than listening |
---|
0:18:58 | what a probability of the screen but it was saying performance where |
---|
0:19:02 | without the need for every |
---|
0:19:05 | the results difference between as i the n-best and instantly in obvious that we wonder |
---|
0:19:10 | why is that the is fitting work |
---|
0:19:14 | so it is also studied in it has led with the transform it is because |
---|
0:19:18 | the italians or entity that or |
---|
0:19:22 | i mean doesn't in there is no |
---|
0:19:25 | so we won't remember a city bus always focus on the same the on the |
---|
0:19:30 | other side exactly you have already body was also incredibly or we don't want to |
---|
0:19:35 | solve problem |
---|
0:19:37 | we're really on all levels |
---|
0:19:39 | i mean and variance |
---|
0:19:41 | and organs performing very well |
---|
0:19:43 | i mean it is obvious what is only obviously modalities are when |
---|
0:19:49 | in the unimodal this so we will maybe that's came are used |
---|
0:19:55 | that's all from my say thank you for |
---|