| 0:00:26 | so um this is the second talk | 
|---|
| 0:00:29 | i about uh i J again a a speaker diarization them what we are trying to focus on multistream approach | 
|---|
| 0:00:34 | use | 
|---|
| 0:00:35 | and it's uh actually detect in the the the baseline technique which we are using a | 
|---|
| 0:00:40 | is the same as in the previous talk which is uh | 
|---|
| 0:00:43 | information but to like system | 
|---|
| 0:00:46 | and to uh we are made me as the P saying the are need trying to look at the a | 
|---|
| 0:00:50 | combination of the outputs or a combination actually of for different different seems on different levels | 
|---|
| 0:00:56 | and these are only acoustic strings cell so no prior information | 
|---|
| 0:01:00 | from brawl statistic so | 
|---|
| 0:01:02 | again | 
|---|
| 0:01:03 | um | 
|---|
| 0:01:04 | and the third | 
|---|
| 0:01:05 | for order here | 
|---|
| 0:01:07 | is this was done by D | 
|---|
| 0:01:08 | was was D is you them to a T D a we let | 
|---|
| 0:01:12 | for for should L D | 
|---|
| 0:01:14 | and um | 
|---|
| 0:01:16 | i interaction a motivation | 
|---|
| 0:01:18 | as the same or | 
|---|
| 0:01:19 | kind of close | 
|---|
| 0:01:21 | um again we set holes uh or we assume that's uh | 
|---|
| 0:01:26 | the recordings which we are working with a are recorded with multiple distant microphones | 
|---|
| 0:01:31 | i i as um actually features what you are using a two kind of a "'cause" to features and that | 
|---|
| 0:01:37 | mfcc features which are kind of standards | 
|---|
| 0:01:39 | and then | 
|---|
| 0:01:40 | i that time delay of are right and i was features | 
|---|
| 0:01:44 | um | 
|---|
| 0:01:45 | each loop they are pretty uh a compliment to mfcc | 
|---|
| 0:01:49 | and uh um people nowadays they they use quite quite a lot for | 
|---|
| 0:01:53 | for uh diarization | 
|---|
| 0:01:56 | actually this combination | 
|---|
| 0:01:58 | winning acoustic feature combination | 
|---|
| 0:02:00 | for we uh | 
|---|
| 0:02:01 | uh uh information but like a technique is | 
|---|
| 0:02:04 | a a key | 
|---|
| 0:02:06 | less a a state-of-the-art results in a meeting data stations | 
|---|
| 0:02:12 | um so back to O two motivations so usually the feature streams are combined or a model level | 
|---|
| 0:02:19 | so | 
|---|
| 0:02:19 | there are separate models for a gmm models | 
|---|
| 0:02:22 | for | 
|---|
| 0:02:23 | different uh actually speak uh streams | 
|---|
| 0:02:25 | and this is are those way away | 
|---|
| 0:02:28 | and the and | 
|---|
| 0:02:29 | these uh actually uh a look like use in the and are combined | 
|---|
| 0:02:32 | with it's some you know waiting | 
|---|
| 0:02:34 | a and there also some other approach is like a voting schemes between | 
|---|
| 0:02:38 | these uh systems | 
|---|
| 0:02:39 | i diarisation systems already | 
|---|
| 0:02:41 | or or actually the initialisation | 
|---|
| 0:02:44 | i run system is done on the output of the other system or some the graded approach | 
|---|
| 0:02:49 | our are actually question a is uh if we can if | 
|---|
| 0:02:52 | and if you see or do this to kind of different acoustic features | 
|---|
| 0:02:56 | can be integrated using independent diarization systems | 
|---|
| 0:02:59 | rather than independent | 
|---|
| 0:03:01 | models or in other word | 
|---|
| 0:03:03 | but but actually D add some advantage of using systems are then | 
|---|
| 0:03:07 | a a combination | 
|---|
| 0:03:08 | but do we mean by system or a combination i hope is going to be clear | 
|---|
| 0:03:12 | uh uh sure or | 
|---|
| 0:03:14 | a to slides | 
|---|
| 0:03:16 | um | 
|---|
| 0:03:19 | so maybe the last one about i'd like blind of the talks so for let me say a few words | 
|---|
| 0:03:24 | of all this | 
|---|
| 0:03:25 | information about but like principal which we use | 
|---|
| 0:03:28 | and which is actually done on single stream that a station so no combination of before features | 
|---|
| 0:03:34 | and also if few words about to model based combination about | 
|---|
| 0:03:38 | system based combination some he bit combination | 
|---|
| 0:03:40 | and the experiment a result | 
|---|
| 0:03:43 | again uh a state-of-the-art results using actually | 
|---|
| 0:03:47 | uh this uh but to make uh | 
|---|
| 0:03:48 | information but the like a technique | 
|---|
| 0:03:51 | um | 
|---|
| 0:03:53 | um we we are getting state of the results with such system and that is not too much of a | 
|---|
| 0:03:58 | computational | 
|---|
| 0:03:59 | complexity in in that | 
|---|
| 0:04:01 | um | 
|---|
| 0:04:02 | so this is uh are the can the advantage | 
|---|
| 0:04:04 | uh how does it work | 
|---|
| 0:04:06 | these information about button like principle | 
|---|
| 0:04:08 | um actually this kind of intuitive div approach each has been borrowed from uh | 
|---|
| 0:04:13 | from a a document clustering so | 
|---|
| 0:04:15 | at the beginning sample that we have some document that you want to class or in | 
|---|
| 0:04:20 | C clusters | 
|---|
| 0:04:21 | in our terminology | 
|---|
| 0:04:22 | and | 
|---|
| 0:04:24 | um | 
|---|
| 0:04:25 | and uh | 
|---|
| 0:04:27 | what these actually | 
|---|
| 0:04:28 | a a what is added did that as a as the information is some body Y which is about but | 
|---|
| 0:04:33 | be of interest | 
|---|
| 0:04:35 | a a or we call it as are but i of a body able which it surely no | 
|---|
| 0:04:39 | or something about discussed ring so some in these uh | 
|---|
| 0:04:43 | a document clustering these why uh why why able can be | 
|---|
| 0:04:47 | a can be words | 
|---|
| 0:04:49 | oh all the vocabulary which | 
|---|
| 0:04:51 | of course to was about uh a about these uh | 
|---|
| 0:04:55 | discussed serves and has information about | 
|---|
| 0:04:57 | a about six a | 
|---|
| 0:04:59 | also so actually some all that there is a a normal condition distribution P you white X so like given | 
|---|
| 0:05:04 | X is available | 
|---|
| 0:05:06 | and back | 
|---|
| 0:05:07 | and going back to this uh a problem or speaker diarisation | 
|---|
| 0:05:11 | our X got to i X is actually set of elements | 
|---|
| 0:05:15 | oh and the speech so again | 
|---|
| 0:05:17 | speech uh segments | 
|---|
| 0:05:20 | again you need for segmentation we we set and | 
|---|
| 0:05:23 | these need to be | 
|---|
| 0:05:24 | uh | 
|---|
| 0:05:25 | uh a cluster into C C class or | 
|---|
| 0:05:29 | so we to this information about the like a principal state | 
|---|
| 0:05:32 | uh that the clustering should be press the ring as much information as possible between | 
|---|
| 0:05:38 | a a C a Y | 
|---|
| 0:05:40 | or by minimizing the distortion these distortion we can see as a | 
|---|
| 0:05:44 | uh some | 
|---|
| 0:05:45 | compression for example | 
|---|
| 0:05:47 | or | 
|---|
| 0:05:48 | also in a our | 
|---|
| 0:05:49 | our way it's actually some regularization regularization so if you don't have uh | 
|---|
| 0:05:54 | these distortion C N N | 
|---|
| 0:05:56 | which is actually but our terms uh | 
|---|
| 0:05:59 | i'm each information | 
|---|
| 0:06:00 | oh oh X and C for i X and C | 
|---|
| 0:06:03 | uh uh if you don't have a it's probably going to | 
|---|
| 0:06:06 | cussing to one one global class or which which is not so the case C one | 
|---|
| 0:06:11 | so i get this i'm | 
|---|
| 0:06:13 | i intuitive div approach | 
|---|
| 0:06:15 | but in the end it looks that uh | 
|---|
| 0:06:18 | or you can be proved | 
|---|
| 0:06:19 | but | 
|---|
| 0:06:20 | if we actually | 
|---|
| 0:06:21 | you are going to | 
|---|
| 0:06:23 | um | 
|---|
| 0:06:24 | have to my this objective function which is again | 
|---|
| 0:06:27 | uh a mutual information C Y | 
|---|
| 0:06:30 | and my nose | 
|---|
| 0:06:31 | some | 
|---|
| 0:06:32 | uh like i to rate or uh X and C | 
|---|
| 0:06:35 | uh yeah are going to | 
|---|
| 0:06:37 | actually uh | 
|---|
| 0:06:39 | to move the problem to the | 
|---|
| 0:06:41 | uh to the way you where | 
|---|
| 0:06:42 | uh the properties | 
|---|
| 0:06:44 | those | 
|---|
| 0:06:45 | that he's Y given X are going to be | 
|---|
| 0:06:48 | uh | 
|---|
| 0:06:49 | measure don't can but using a simple divorce and | 
|---|
| 0:06:52 | so | 
|---|
| 0:06:53 | but the point so we don't need to look for some | 
|---|
| 0:06:55 | especially divisions of the as your which is saying | 
|---|
| 0:06:58 | which got a of we should we should be him together | 
|---|
| 0:07:01 | in this uh in do | 
|---|
| 0:07:02 | and so intuitive approach | 
|---|
| 0:07:04 | i be due the derivation we will find out that actually that should be jensen jensen channel uh the imagines | 
|---|
| 0:07:09 | used for | 
|---|
| 0:07:10 | for clustering | 
|---|
| 0:07:11 | so in the end uh the approach is pretty simple or | 
|---|
| 0:07:16 | going to be is | 
|---|
| 0:07:17 | so here it's actually a got marty for a | 
|---|
| 0:07:20 | a also in each iteration them the are | 
|---|
| 0:07:23 | we are uh | 
|---|
| 0:07:24 | we are thing to clusters together are based on the information | 
|---|
| 0:07:28 | uh from these uh give chance so we take those clusters which have | 
|---|
| 0:07:32 | the small the and we just met jim | 
|---|
| 0:07:34 | and you do it it's that to the um | 
|---|
| 0:07:36 | until | 
|---|
| 0:07:37 | should is some stop criteria | 
|---|
| 0:07:39 | stop it that you know | 
|---|
| 0:07:40 | is again pretty simple and it is actually a normalized | 
|---|
| 0:07:44 | but you or from | 
|---|
| 0:07:46 | i go back | 
|---|
| 0:07:47 | uh this a mutual information between C and Y | 
|---|
| 0:07:51 | so so again mm to somehow O | 
|---|
| 0:07:55 | i i know finalised this uh i the approach | 
|---|
| 0:07:58 | uh | 
|---|
| 0:07:59 | right is good we have us to pink daddy and we have actually | 
|---|
| 0:08:03 | the the um | 
|---|
| 0:08:05 | where you how to measure your the | 
|---|
| 0:08:07 | the similarity between between clusters | 
|---|
| 0:08:09 | and uh | 
|---|
| 0:08:11 | it's pretty simple | 
|---|
| 0:08:12 | to to and coded it you know | 
|---|
| 0:08:14 | so | 
|---|
| 0:08:15 | um | 
|---|
| 0:08:17 | oh just a a few information about uh are those properties which are actually here so | 
|---|
| 0:08:21 | would be fairly suppose that uh by but you of C given an X where C is cluster eight X | 
|---|
| 0:08:27 | is input uh segment | 
|---|
| 0:08:28 | is going to be hard | 
|---|
| 0:08:30 | partition meaning | 
|---|
| 0:08:31 | it all | 
|---|
| 0:08:32 | all these bills only to one class or | 
|---|
| 0:08:34 | but is no like | 
|---|
| 0:08:35 | a a week a uh weighting between several class er | 
|---|
| 0:08:39 | and place probability why given C which is actually | 
|---|
| 0:08:43 | a a some yeah but about a viable | 
|---|
| 0:08:46 | yeah distribution | 
|---|
| 0:08:47 | which which is used to a actually to do this so merging | 
|---|
| 0:08:51 | and um | 
|---|
| 0:08:55 | everything should be more clear to on this | 
|---|
| 0:08:58 | on this up your | 
|---|
| 0:08:59 | so i mean suppose we have input speech which is uniformly segment it | 
|---|
| 0:09:04 | oh for example mfcc features in this single | 
|---|
| 0:09:07 | some the approach | 
|---|
| 0:09:09 | we have uh elements of | 
|---|
| 0:09:11 | these | 
|---|
| 0:09:12 | and among variables | 
|---|
| 0:09:13 | i still didn't say what it is but | 
|---|
| 0:09:15 | i i it's probably in T if in | 
|---|
| 0:09:17 | our case is just universal background model | 
|---|
| 0:09:20 | you just on and tired speech | 
|---|
| 0:09:22 | and uh | 
|---|
| 0:09:23 | uh this is actually defining body able to what you to do the thing so | 
|---|
| 0:09:28 | actually actually state or which you see in the middle or are back doors P why you an X which | 
|---|
| 0:09:33 | are | 
|---|
| 0:09:33 | probabilities | 
|---|
| 0:09:35 | for a vector Y given | 
|---|
| 0:09:37 | uh you the input segments | 
|---|
| 0:09:40 | and um | 
|---|
| 0:09:42 | the clustering which is a a again competitive technique and in the end we get some initial segmentation | 
|---|
| 0:09:48 | and finally we do refinement using ca | 
|---|
| 0:09:51 | training a gmm and doing viterbi decoding | 
|---|
| 0:09:58 | that are let's go back to | 
|---|
| 0:09:59 | to the feature combination | 
|---|
| 0:10:02 | so in case of uh | 
|---|
| 0:10:04 | uh a feature combination which is based on the big around what else so suppose that we can have to | 
|---|
| 0:10:09 | features again uh a few just a at is and and tdoa away | 
|---|
| 0:10:13 | and we have to big our models | 
|---|
| 0:10:15 | uh each are trained on on such features | 
|---|
| 0:10:18 | uh what we can simply do that | 
|---|
| 0:10:19 | we uh we can just wait can nearly weights | 
|---|
| 0:10:22 | these uh | 
|---|
| 0:10:23 | B Y given X uh | 
|---|
| 0:10:25 | vectors or probabilities | 
|---|
| 0:10:27 | with | 
|---|
| 0:10:27 | put some weight | 
|---|
| 0:10:28 | and it's going to be us new mats weeks | 
|---|
| 0:10:31 | oh for these settlements sorry abilities | 
|---|
| 0:10:33 | in the | 
|---|
| 0:10:34 | a these weights | 
|---|
| 0:10:36 | how to get a to of course we trained them or estimate them on the development data so | 
|---|
| 0:10:41 | we should be juror rising or different data | 
|---|
| 0:10:43 | L so one | 
|---|
| 0:10:45 | we have actually these P Y X is make it's the rest of the diarization system is same so P | 
|---|
| 0:10:49 | actually do it just at the beginning where we combine these | 
|---|
| 0:10:53 | i are buttons where is | 
|---|
| 0:10:54 | and then we just just do a iterative | 
|---|
| 0:10:57 | approach to | 
|---|
| 0:10:58 | to do clustering | 
|---|
| 0:11:00 | so actually this is not a new these has been already but uh | 
|---|
| 0:11:03 | published be i row last the interspeech | 
|---|
| 0:11:06 | um this is just again the gap how how it is down | 
|---|
| 0:11:10 | a again there is a matrix cold | 
|---|
| 0:11:11 | thus be white X | 
|---|
| 0:11:13 | probably | 
|---|
| 0:11:14 | um | 
|---|
| 0:11:15 | the vectors like an vectors | 
|---|
| 0:11:17 | and they are simply | 
|---|
| 0:11:18 | a a it's uh by by alright right | 
|---|
| 0:11:21 | yeah and then there is a clustering operation and refinement | 
|---|
| 0:11:25 | now what is actually knew and what uh what we are type in this paper is uh | 
|---|
| 0:11:30 | multiple system combination | 
|---|
| 0:11:32 | so so | 
|---|
| 0:11:33 | a set of doing the combination before clustering uh what would happen if you do combination after clustering | 
|---|
| 0:11:39 | so | 
|---|
| 0:11:40 | um | 
|---|
| 0:11:41 | again with a of that they are to big our models | 
|---|
| 0:11:44 | oh trained on different uh features | 
|---|
| 0:11:46 | and they are two diarization systems in the end so | 
|---|
| 0:11:49 | uh we | 
|---|
| 0:11:50 | actually it actively | 
|---|
| 0:11:52 | get some clusters | 
|---|
| 0:11:53 | a stopping titanium actually can be different | 
|---|
| 0:11:56 | meaning | 
|---|
| 0:11:57 | can have different number of clusters for | 
|---|
| 0:11:59 | for a feature a a or four it should be | 
|---|
| 0:12:02 | the end to be get a this in these wide given X | 
|---|
| 0:12:06 | or a you see actually | 
|---|
| 0:12:08 | and | 
|---|
| 0:12:09 | and | 
|---|
| 0:12:09 | a time to go back | 
|---|
| 0:12:11 | from this class to initial segmentation | 
|---|
| 0:12:14 | is | 
|---|
| 0:12:14 | have been that would D Y you X | 
|---|
| 0:12:16 | i to do is just simple by bison operation | 
|---|
| 0:12:20 | and um | 
|---|
| 0:12:21 | again there is um | 
|---|
| 0:12:23 | something you image how how this is done | 
|---|
| 0:12:25 | so again and that two diarization systems | 
|---|
| 0:12:29 | which are doing complete clustering | 
|---|
| 0:12:32 | and in the end we are again getting a | 
|---|
| 0:12:34 | um some | 
|---|
| 0:12:36 | we are getting | 
|---|
| 0:12:37 | some clusters and to get actually back | 
|---|
| 0:12:39 | two | 
|---|
| 0:12:40 | to this initial segments P Y given X | 
|---|
| 0:12:43 | uh we just a apply those uh a simple operations um | 
|---|
| 0:12:47 | and just simply | 
|---|
| 0:12:48 | uh integrated over all be like C | 
|---|
| 0:12:54 | uh | 
|---|
| 0:12:54 | why why this should actually work uh is uh again between two intuitive | 
|---|
| 0:12:59 | in this case uh these be Y X | 
|---|
| 0:13:02 | after combination are actually estimate it on | 
|---|
| 0:13:05 | a a large amount of data so if they are not estimated on those short segments | 
|---|
| 0:13:09 | as in case so for a your combination | 
|---|
| 0:13:12 | before for clustering | 
|---|
| 0:13:13 | now each actually white a is uh | 
|---|
| 0:13:16 | estimated it or not | 
|---|
| 0:13:17 | on a lot of data because you have just you cost in the end of course | 
|---|
| 0:13:24 | um um | 
|---|
| 0:13:25 | the third approach so | 
|---|
| 0:13:27 | a actually keep it system so each is just the combination of those two but also | 
|---|
| 0:13:33 | uh are before passing and after clustering | 
|---|
| 0:13:36 | so in one case | 
|---|
| 0:13:38 | what we can do use just | 
|---|
| 0:13:40 | that before a as we just uh | 
|---|
| 0:13:42 | or | 
|---|
| 0:13:43 | and a one in one a a simple stream just do uh | 
|---|
| 0:13:47 | a a system combination and then we just uh | 
|---|
| 0:13:50 | a combine such output with a | 
|---|
| 0:13:53 | yeah are the others | 
|---|
| 0:13:54 | stream uh | 
|---|
| 0:13:56 | and she's to be before to cussing so maybe it's it's more seen here | 
|---|
| 0:13:59 | i into two streams | 
|---|
| 0:14:00 | in one case we do this system combination so we two clustering and from these be white C but is | 
|---|
| 0:14:06 | we go back to be Y X | 
|---|
| 0:14:08 | to get initial | 
|---|
| 0:14:09 | we show segmentation or initial properties for for the segmentation | 
|---|
| 0:14:13 | and in in the second case actually be | 
|---|
| 0:14:17 | we just do these uh um | 
|---|
| 0:14:20 | she's uh did you always stream | 
|---|
| 0:14:22 | just | 
|---|
| 0:14:22 | uh | 
|---|
| 0:14:24 | i try to do these combination before | 
|---|
| 0:14:27 | for for clustering | 
|---|
| 0:14:28 | that's a those to the kings are simply combine of course | 
|---|
| 0:14:31 | i i D and we have some you Y X uh | 
|---|
| 0:14:34 | but takes | 
|---|
| 0:14:35 | a P Y C about six N B just the i'm and as before | 
|---|
| 0:14:38 | of course there are two possible K sees uh what should be done on beach kind of theme | 
|---|
| 0:14:43 | and uh this is going to be seen the results are going to be the seen in table but again | 
|---|
| 0:14:47 | maybe it's into a D for how this should be done so that we say a few words about the | 
|---|
| 0:14:51 | experiments | 
|---|
| 0:14:52 | uh we are using the same but each transcription data uh system me sister uh sending meetings so no i | 
|---|
| 0:14:58 | mean data but the only rich transcription | 
|---|
| 0:15:01 | um the mfcc features and these uh | 
|---|
| 0:15:04 | uh tdoa features | 
|---|
| 0:15:06 | um | 
|---|
| 0:15:07 | and uh | 
|---|
| 0:15:08 | and she or the speech is coming from and the and they again | 
|---|
| 0:15:11 | um be | 
|---|
| 0:15:12 | uh | 
|---|
| 0:15:13 | single and hence speech signal | 
|---|
| 0:15:15 | um | 
|---|
| 0:15:16 | again the was weights which between the estimate are are estimated on the open set | 
|---|
| 0:15:21 | um as before we are only many shopping diarization error rate with respect to speaker or or so not speech | 
|---|
| 0:15:28 | or speech nonspeech there | 
|---|
| 0:15:31 | a a are the results each be a shift if you remember from the previews uh to talk | 
|---|
| 0:15:36 | the baseline was around fifteen or | 
|---|
| 0:15:38 | fifteen point five uh percent | 
|---|
| 0:15:41 | was uh | 
|---|
| 0:15:42 | actually use | 
|---|
| 0:15:43 | single stream techniques so just mfcc features | 
|---|
| 0:15:46 | you do can nation | 
|---|
| 0:15:48 | oh for mfcc and tdoa features | 
|---|
| 0:15:51 | in case of information but to technique and | 
|---|
| 0:15:54 | kind of the H M and gmm | 
|---|
| 0:15:56 | uh we may see that to because we get to you and twelve percent | 
|---|
| 0:16:00 | um | 
|---|
| 0:16:01 | and the second but is just to a being but are the weights those are weights for | 
|---|
| 0:16:06 | reading the | 
|---|
| 0:16:08 | but different features so in case of | 
|---|
| 0:16:10 | because these are different quantity so in our case of some properties which are actually | 
|---|
| 0:16:15 | which we are combining | 
|---|
| 0:16:17 | in case of a and uh J and those are a look like people so | 
|---|
| 0:16:21 | a that's why also be so uh weights are different | 
|---|
| 0:16:25 | and again in our case the combination is done using can of variables | 
|---|
| 0:16:29 | and this is actually as you see you can see a perform K the | 
|---|
| 0:16:33 | the a of system | 
|---|
| 0:16:35 | so these are the results for combination | 
|---|
| 0:16:38 | uh but combination | 
|---|
| 0:16:40 | one no | 
|---|
| 0:16:42 | on the um | 
|---|
| 0:16:43 | actually after clustering so | 
|---|
| 0:16:45 | combination system level as as we call it | 
|---|
| 0:16:47 | so in that is this base like you and point six percent comes from the previous table | 
|---|
| 0:16:52 | you do system | 
|---|
| 0:16:54 | combination meaning after a can my these tolerance of labels after | 
|---|
| 0:16:58 | clustering cut a you may C V are getting pretty high uh almost forty percent uh improvement | 
|---|
| 0:17:04 | and then they are of course two possible combinations of system and model | 
|---|
| 0:17:08 | and a weeding | 
|---|
| 0:17:10 | um | 
|---|
| 0:17:11 | actually | 
|---|
| 0:17:12 | looks | 
|---|
| 0:17:13 | and again it's pretty straightforward that | 
|---|
| 0:17:15 | it's better to | 
|---|
| 0:17:16 | to do see stan | 
|---|
| 0:17:17 | combination or system waiting we the tdoa features because they are usually | 
|---|
| 0:17:22 | mm more noisy | 
|---|
| 0:17:24 | and they need probably more data to were to be what estimated it or at least those of viable | 
|---|
| 0:17:30 | to have more data to to to be but estimated | 
|---|
| 0:17:32 | in case of a and that's is is features uh it looks at works so much better | 
|---|
| 0:17:36 | so that's why reason | 
|---|
| 0:17:38 | also you may look at the table | 
|---|
| 0:17:40 | a race | 
|---|
| 0:17:41 | if the the weights goals close to the | 
|---|
| 0:17:44 | those weights uh which we need to estimate the goal the go close to the system combination so instead of | 
|---|
| 0:17:50 | zero point seven zero point three | 
|---|
| 0:17:52 | we go to zero point eight | 
|---|
| 0:17:53 | and then estimated on different data but | 
|---|
| 0:17:56 | to generalise | 
|---|
| 0:17:58 | for this case | 
|---|
| 0:18:00 | um | 
|---|
| 0:18:01 | uh | 
|---|
| 0:18:02 | just a B to explain why | 
|---|
| 0:18:04 | possibly why we are getting such improvement | 
|---|
| 0:18:07 | a if you look at the single the stream | 
|---|
| 0:18:09 | a a results | 
|---|
| 0:18:10 | for each meeting seventeen meetings can this case | 
|---|
| 0:18:13 | so are | 
|---|
| 0:18:14 | but model combination and system combination | 
|---|
| 0:18:17 | um um | 
|---|
| 0:18:18 | and you look at the button or which is just simple and S C and D do you do away | 
|---|
| 0:18:23 | information but to neck techniques so there is no combination of different features | 
|---|
| 0:18:27 | may see that | 
|---|
| 0:18:28 | most of the improvement comes in case | 
|---|
| 0:18:31 | but is a big gap between | 
|---|
| 0:18:32 | those two single stream techniques | 
|---|
| 0:18:36 | we have the course you don't get to improvement but you is | 
|---|
| 0:18:38 | a a big gap between mfcc and tdoa single stream | 
|---|
| 0:18:42 | but system combination works so P develop for such a meeting | 
|---|
| 0:18:51 | and um | 
|---|
| 0:18:52 | just to conclude the paper | 
|---|
| 0:18:54 | uh so here we are present a new technique for or new weight of combination of of the streams of | 
|---|
| 0:19:00 | a was six teams | 
|---|
| 0:19:01 | so rather as we did before uh | 
|---|
| 0:19:04 | before clustering to to way the the acoustic features here we are present technique which | 
|---|
| 0:19:09 | actually is trying to do we after clustering | 
|---|
| 0:19:11 | and the reason uh a simple for that this uh probably the these on the variables which | 
|---|
| 0:19:16 | which are used to then to what you're to match different different uh a clusters or different segments | 
|---|
| 0:19:22 | are | 
|---|
| 0:19:23 | going to be estimated on are more data | 
|---|
| 0:19:25 | or not just on on | 
|---|
| 0:19:27 | short segments | 
|---|
| 0:19:29 | and uh actually uh as it was seeing in uh in uh | 
|---|
| 0:19:34 | the results you are getting pretty cool to improvement for | 
|---|
| 0:19:37 | for such a technique so forty percent uh | 
|---|
| 0:19:40 | that were all seventeen meeting | 
|---|
| 0:19:43 | um i think i'm done | 
|---|
| 0:19:46 | oh | 
|---|
| 0:19:47 | we | 
|---|
| 0:19:48 | the on spoken | 
|---|
| 0:19:53 | since something that i mean | 
|---|
| 0:19:55 | i no not i think some a specific question | 
|---|
| 0:20:00 | for them | 
|---|
| 0:20:01 | yeah | 
|---|
| 0:20:03 | i for all of the | 
|---|
| 0:20:04 | and and goes to P | 
|---|