| 0:00:06 | okay i'm going to talk about the what we did together with uh when you only be suggesting this one | 
|---|
| 0:00:12 | uh | 
|---|
| 0:00:13 | in the scope of the | 
|---|
| 0:00:14 | last | 
|---|
| 0:00:15 | and a simulation | 
|---|
| 0:00:17 | per speaker | 
|---|
| 0:00:18 | or it was the motivation beginning | 
|---|
| 0:00:21 | this is | 
|---|
| 0:00:21 | uh before the line of my presentation | 
|---|
| 0:00:24 | i would try to motivate and interviews the problem we went to phase and we went to sell | 
|---|
| 0:00:29 | and | 
|---|
| 0:00:30 | since my words but it wasn't related to connectionist speech recognition i | 
|---|
| 0:00:33 | i will | 
|---|
| 0:00:34 | uh have a look to the very basic cell so | 
|---|
| 0:00:37 | and then they will | 
|---|
| 0:00:39 | interviews the | 
|---|
| 0:00:40 | uh the | 
|---|
| 0:00:41 | the novel features obtained using this work but they call | 
|---|
| 0:00:44 | the information that would features | 
|---|
| 0:00:46 | for speaker recognition | 
|---|
| 0:00:47 | and then we go uh we we good | 
|---|
| 0:00:49 | we will go to the the experiments and | 
|---|
| 0:00:51 | so conclusions and some | 
|---|
| 0:00:53 | feature to work or work ideas | 
|---|
| 0:00:56 | so | 
|---|
| 0:00:56 | uh the main motivation was | 
|---|
| 0:00:58 | uh we wanted to participate in this | 
|---|
| 0:01:00 | and me stipulation | 
|---|
| 0:01:02 | we saw that the best systems were using | 
|---|
| 0:01:04 | um | 
|---|
| 0:01:05 | an arm is | 
|---|
| 0:01:06 | different amount of | 
|---|
| 0:01:07 | systems | 
|---|
| 0:01:08 | combining them | 
|---|
| 0:01:09 | and what | 
|---|
| 0:01:11 | i want i'm not going to mention them but eh | 
|---|
| 0:01:14 | you know there are many men is possible subsystems | 
|---|
| 0:01:16 | and | 
|---|
| 0:01:18 | i'm on all of them i was | 
|---|
| 0:01:19 | particularly attractive uh directed by | 
|---|
| 0:01:22 | but what are usually called like level features that it's in close relation with the prior session | 
|---|
| 0:01:27 | and basically this | 
|---|
| 0:01:28 | this system say to use | 
|---|
| 0:01:29 | speaker adaptation transforms employed in asr systems for speaker detection | 
|---|
| 0:01:34 | features | 
|---|
| 0:01:35 | and our proposed as alternatives to four times a capsule | 
|---|
| 0:01:39 | features that are the most commonly used one | 
|---|
| 0:01:42 | and you know we have the the work of a and veracity | 
|---|
| 0:01:44 | in fact uh the working press it into the it's very closely we | 
|---|
| 0:01:47 | but it was never samples that were in that | 
|---|
| 0:01:49 | in another | 
|---|
| 0:01:51 | in another | 
|---|
| 0:01:52 | with some difference of course | 
|---|
| 0:01:53 | the same | 
|---|
| 0:01:54 | and basically | 
|---|
| 0:01:55 | uh uh | 
|---|
| 0:01:56 | what is done in this work | 
|---|
| 0:01:58 | uh is | 
|---|
| 0:01:59 | places to use | 
|---|
| 0:02:00 | uh weights they re | 
|---|
| 0:02:01 | from mllr transforms | 
|---|
| 0:02:03 | to produce a grimace in the back doors | 
|---|
| 0:02:05 | uh | 
|---|
| 0:02:06 | concatenate them and use | 
|---|
| 0:02:07 | this is four coefficients to model | 
|---|
| 0:02:09 | speaker | 
|---|
| 0:02:10 | uh support vector machines | 
|---|
| 0:02:12 | so | 
|---|
| 0:02:13 | what is the problem | 
|---|
| 0:02:15 | we propose that at least one of them | 
|---|
| 0:02:17 | i don't like | 
|---|
| 0:02:18 | we'd have been always working | 
|---|
| 0:02:19 | uh we i read a a a and then a tremendous sense of this is that on the neural networks | 
|---|
| 0:02:24 | remarkable systems | 
|---|
| 0:02:26 | and | 
|---|
| 0:02:28 | and | 
|---|
| 0:02:29 | i wish to show later | 
|---|
| 0:02:31 | some characteristics but uh the | 
|---|
| 0:02:33 | the main problem nor the motivation for this work is that | 
|---|
| 0:02:35 | we can no | 
|---|
| 0:02:36 | we cannot use | 
|---|
| 0:02:38 | a typical adaptation methods like | 
|---|
| 0:02:40 | mllr that are usually used in | 
|---|
| 0:02:43 | and gaussian approaches | 
|---|
| 0:02:45 | so | 
|---|
| 0:02:45 | what i try | 
|---|
| 0:02:46 | to doing this work at the very beginning | 
|---|
| 0:02:49 | it began with | 
|---|
| 0:02:50 | to see if i can do something similar to the motor transformation for | 
|---|
| 0:02:53 | i've read | 
|---|
| 0:02:54 | uh | 
|---|
| 0:02:55 | um | 
|---|
| 0:02:56 | systems and if we can use it | 
|---|
| 0:02:57 | uh to obtain the speaker | 
|---|
| 0:02:59 | information into china | 
|---|
| 0:03:01 | a speaker discussion system | 
|---|
| 0:03:03 | and it with the farthest 'em | 
|---|
| 0:03:04 | some | 
|---|
| 0:03:05 | baseline systems in that in that | 
|---|
| 0:03:07 | it's very with us tonight | 
|---|
| 0:03:08 | uh telephone bill for condition | 
|---|
| 0:03:11 | so | 
|---|
| 0:03:12 | the minister | 
|---|
| 0:03:13 | with some of the irony is a basis for | 
|---|
| 0:03:16 | uh | 
|---|
| 0:03:18 | uh | 
|---|
| 0:03:18 | you know for the one | 
|---|
| 0:03:21 | someone do the probably don't are not very related | 
|---|
| 0:03:23 | basically we have been working on this | 
|---|
| 0:03:25 | uh for some applications | 
|---|
| 0:03:27 | mainly for business than prescription | 
|---|
| 0:03:29 | uh but also for telephony get a telephone applications | 
|---|
| 0:03:32 | and for some other languages but | 
|---|
| 0:03:34 | our main focus so quickly or two is | 
|---|
| 0:03:36 | um | 
|---|
| 0:03:38 | and usually is considered | 
|---|
| 0:03:39 | that was the way he works is that will replace the gaussian | 
|---|
| 0:03:42 | for a neural network and all cases are and will be emulated perceptron | 
|---|
| 0:03:47 | and we use this uh | 
|---|
| 0:03:48 | the probability estimations | 
|---|
| 0:03:50 | as the dubs a a as the | 
|---|
| 0:03:53 | i'll probably this | 
|---|
| 0:03:54 | or as | 
|---|
| 0:03:55 | as the break this uh | 
|---|
| 0:03:57 | the pasta probably this L the | 
|---|
| 0:03:59 | of the single state hmm | 
|---|
| 0:04:01 | and usually | 
|---|
| 0:04:03 | uh we have very uh relatively few outputs | 
|---|
| 0:04:06 | like | 
|---|
| 0:04:07 | just | 
|---|
| 0:04:07 | uh phonemes or some other so for that | 
|---|
| 0:04:09 | uh | 
|---|
| 0:04:10 | units but not not a more | 
|---|
| 0:04:12 | the main characteristics is that | 
|---|
| 0:04:13 | or | 
|---|
| 0:04:14 | is that they are usually considered but up to classify of the the neural networks | 
|---|
| 0:04:18 | the up with these two marks of other streams | 
|---|
| 0:04:20 | and | 
|---|
| 0:04:21 | they're pretty good for a blind test | 
|---|
| 0:04:23 | as we will just leave you | 
|---|
| 0:04:25 | and the only time we have | 
|---|
| 0:04:27 | uh some problems with context modelling | 
|---|
| 0:04:29 | uh and also | 
|---|
| 0:04:31 | uh with annotation it's no so or at least they are not so | 
|---|
| 0:04:34 | so what estimation methods like in gaussian systems | 
|---|
| 0:04:37 | so this is this is | 
|---|
| 0:04:38 | uh uh diagram block of | 
|---|
| 0:04:40 | or row 'cause my suspicion system for make an english | 
|---|
| 0:04:43 | you can see | 
|---|
| 0:04:44 | probably | 
|---|
| 0:04:46 | so that this | 
|---|
| 0:04:47 | here | 
|---|
| 0:04:47 | and use this | 
|---|
| 0:04:49 | okay | 
|---|
| 0:04:50 | okay | 
|---|
| 0:04:50 | we can you can you can see a similar streams with different features you pay a fee B B features | 
|---|
| 0:04:56 | uh | 
|---|
| 0:04:57 | you be with breast the features on | 
|---|
| 0:04:59 | this one so modulation spectrum for each of features | 
|---|
| 0:05:01 | it's one of them | 
|---|
| 0:05:02 | for me it's a different | 
|---|
| 0:05:03 | we could later perceptron | 
|---|
| 0:05:05 | well trained with with | 
|---|
| 0:05:06 | with they uh with like a | 
|---|
| 0:05:09 | with transcriptions | 
|---|
| 0:05:10 | and everything | 
|---|
| 0:05:11 | and we marked | 
|---|
| 0:05:12 | uh the similar stint in uh with a simple um | 
|---|
| 0:05:15 | problem or | 
|---|
| 0:05:16 | rule | 
|---|
| 0:05:18 | this probably does this posterior so the use by uh because or | 
|---|
| 0:05:21 | to have with a language model the lexical oh no | 
|---|
| 0:05:24 | and | 
|---|
| 0:05:24 | uh some definitions of the hmm | 
|---|
| 0:05:27 | that is the relation between the perot would probably be and | 
|---|
| 0:05:30 | and wait | 
|---|
| 0:05:31 | phoneme for instance | 
|---|
| 0:05:32 | uh represent and that | 
|---|
| 0:05:33 | minimal relation also | 
|---|
| 0:05:35 | to provide the most likely word | 
|---|
| 0:05:37 | or send | 
|---|
| 0:05:38 | uh some | 
|---|
| 0:05:40 | characteristics of the system | 
|---|
| 0:05:41 | in dummies | 
|---|
| 0:05:42 | a four | 
|---|
| 0:05:44 | and made in nineteen seventy bottle said | 
|---|
| 0:05:46 | we have been a one to call in less one group | 
|---|
| 0:05:48 | one real time | 
|---|
| 0:05:50 | is that a seventy percent | 
|---|
| 0:05:52 | one six were right | 
|---|
| 0:05:54 | will this will use this in the form of the phonemes some others of phonetic units | 
|---|
| 0:05:58 | and we train with a four | 
|---|
| 0:06:00 | although the forty hours | 
|---|
| 0:06:01 | i would have a program language model | 
|---|
| 0:06:03 | that is the interpolation of the transcripts and the | 
|---|
| 0:06:06 | and | 
|---|
| 0:06:07 | written that uh from newspapers | 
|---|
| 0:06:09 | and that a relatively small colours | 
|---|
| 0:06:12 | just a four thousand | 
|---|
| 0:06:14 | so uh in that let's just say i'm just saying | 
|---|
| 0:06:17 | um sorry data | 
|---|
| 0:06:19 | for evaluation and uh | 
|---|
| 0:06:22 | i needed to | 
|---|
| 0:06:23 | the trained on you | 
|---|
| 0:06:24 | and you a speech recogniser | 
|---|
| 0:06:25 | and it's i i read about a very very very weak | 
|---|
| 0:06:27 | system | 
|---|
| 0:06:28 | because it is i have access to C T S | 
|---|
| 0:06:30 | uh it's speech | 
|---|
| 0:06:31 | and basically way what is was to train new remote unit and networks with | 
|---|
| 0:06:35 | we don't simply data with the simple 'cause news data | 
|---|
| 0:06:38 | and there is some other differences to the system i use | 
|---|
| 0:06:40 | in this work is that they | 
|---|
| 0:06:42 | i have another another issue mostly with and see a bunch of from today | 
|---|
| 0:06:45 | different than features | 
|---|
| 0:06:46 | i don't use a monophone M unit | 
|---|
| 0:06:49 | and | 
|---|
| 0:06:49 | and i did some very informal evaluations yes | 
|---|
| 0:06:52 | see for myself how it was working then | 
|---|
| 0:06:54 | and in in telephone data conversational telephone is it | 
|---|
| 0:06:58 | and i have | 
|---|
| 0:06:58 | uh very everywhere | 
|---|
| 0:07:00 | i | 
|---|
| 0:07:01 | where relate | 
|---|
| 0:07:02 | uh but anyway | 
|---|
| 0:07:04 | this | 
|---|
| 0:07:04 | recognise is used for | 
|---|
| 0:07:05 | uh two purposes for support for someone | 
|---|
| 0:07:07 | as in a uh generate a phonetic alignment with the descriptions of the provided by nist | 
|---|
| 0:07:12 | and | 
|---|
| 0:07:13 | and also for for training the | 
|---|
| 0:07:16 | the speaker adaptation | 
|---|
| 0:07:17 | the summation | 
|---|
| 0:07:20 | so | 
|---|
| 0:07:20 | uh how how can we | 
|---|
| 0:07:22 | uh the other | 
|---|
| 0:07:23 | i will be needed and then works | 
|---|
| 0:07:25 | to a speaker information or whatever else | 
|---|
| 0:07:28 | uh | 
|---|
| 0:07:28 | there are several approaches but i | 
|---|
| 0:07:30 | some basic | 
|---|
| 0:07:31 | the two of them | 
|---|
| 0:07:32 | the first one would be uh starting from i speak in that and then i'm open an mlp network | 
|---|
| 0:07:38 | uh | 
|---|
| 0:07:38 | we can do uh the rival uh of what propagation algorithm | 
|---|
| 0:07:42 | and | 
|---|
| 0:07:43 | mm | 
|---|
| 0:07:44 | what | 
|---|
| 0:07:44 | we started with a network of anyone train it | 
|---|
| 0:07:47 | instead of random | 
|---|
| 0:07:48 | wait | 
|---|
| 0:07:49 | and with about voice and i went and we had that the weights and | 
|---|
| 0:07:52 | and that so | 
|---|
| 0:07:53 | uh the other | 
|---|
| 0:07:54 | think we can do it's probably | 
|---|
| 0:07:55 | just a they'd some of the weights forces that | 
|---|
| 0:07:58 | the ones that we go from the the last hidden layer to that | 
|---|
| 0:08:01 | to the output layer | 
|---|
| 0:08:03 | the price of something | 
|---|
| 0:08:04 | more interesting to do | 
|---|
| 0:08:06 | it's too | 
|---|
| 0:08:06 | to modify the the structure of the detector | 
|---|
| 0:08:09 | of the mlp network | 
|---|
| 0:08:11 | and tried | 
|---|
| 0:08:12 | not too | 
|---|
| 0:08:13 | to modify the speaker independent component | 
|---|
| 0:08:15 | and that's what we can do for instance well we can get | 
|---|
| 0:08:18 | there's more that for most of the phonetic level that would be | 
|---|
| 0:08:20 | to uh uh | 
|---|
| 0:08:22 | some kind of transformation at the output of the problems and try to | 
|---|
| 0:08:25 | but that to the speaker characteristics and on the other hand you can | 
|---|
| 0:08:28 | try to the same | 
|---|
| 0:08:29 | at the acoustic level | 
|---|
| 0:08:31 | try to add that the | 
|---|
| 0:08:32 | the features the input features to the characteristics of the | 
|---|
| 0:08:35 | from that could from the speaker dependent of the characteristics of the speaker independent | 
|---|
| 0:08:39 | uh system so | 
|---|
| 0:08:41 | this last solution | 
|---|
| 0:08:43 | i i did some just for a desire to verify that this could work on it works | 
|---|
| 0:08:47 | and and i i from that what that that was the best one | 
|---|
| 0:08:51 | for yourself application so | 
|---|
| 0:08:53 | hey | 
|---|
| 0:08:54 | decided to try | 
|---|
| 0:08:56 | also forced to get | 
|---|
| 0:08:58 | and here we have a um | 
|---|
| 0:09:00 | a typical mlp neville with just one could allow yeah | 
|---|
| 0:09:03 | is it impolite or the feel i don't open later | 
|---|
| 0:09:06 | and | 
|---|
| 0:09:07 | how can we train dissertation | 
|---|
| 0:09:09 | uh features or this other additions | 
|---|
| 0:09:11 | uh | 
|---|
| 0:09:12 | lattices | 
|---|
| 0:09:13 | basically we incorporate a newly nine | 
|---|
| 0:09:16 | lighter than the beginning | 
|---|
| 0:09:18 | and we apply | 
|---|
| 0:09:19 | uh there but replication algorithm | 
|---|
| 0:09:21 | as usual i mean | 
|---|
| 0:09:22 | we have data would labels we it make the forward propagation compute the output of the network | 
|---|
| 0:09:27 | there are | 
|---|
| 0:09:28 | we do the | 
|---|
| 0:09:29 | okay | 
|---|
| 0:09:30 | the quality cover we do that | 
|---|
| 0:09:32 | about the position of the yeah well | 
|---|
| 0:09:33 | and then | 
|---|
| 0:09:35 | when it comes to the the weight | 
|---|
| 0:09:36 | i'm sorry | 
|---|
| 0:09:37 | well | 
|---|
| 0:09:38 | opening | 
|---|
| 0:09:39 | no but | 
|---|
| 0:09:40 | okay when it comes to | 
|---|
| 0:09:41 | to the bit the weight | 
|---|
| 0:09:42 | we just the the the date of the of the linear would never and we can | 
|---|
| 0:09:46 | we keep | 
|---|
| 0:09:47 | froze and the the the speaker independent component | 
|---|
| 0:09:50 | so | 
|---|
| 0:09:51 | let me | 
|---|
| 0:09:53 | okay | 
|---|
| 0:09:54 | about this | 
|---|
| 0:09:55 | the formation of a normalisation uh | 
|---|
| 0:09:57 | well i seaside it's intended them up in common the switched over the representation that consists | 
|---|
| 0:10:02 | the mlp uh performance | 
|---|
| 0:10:04 | and it can be considered a kind of sorry | 
|---|
| 0:10:06 | right hand | 
|---|
| 0:10:06 | as for the normalisation | 
|---|
| 0:10:08 | but with some | 
|---|
| 0:10:09 | a special characteristics because | 
|---|
| 0:10:11 | we are not imposing any any | 
|---|
| 0:10:13 | a restriction in addition to the base station process i mean | 
|---|
| 0:10:16 | we don't have a | 
|---|
| 0:10:17 | a target speaker that we try to normalise | 
|---|
| 0:10:19 | that that uh | 
|---|
| 0:10:20 | the data | 
|---|
| 0:10:21 | and | 
|---|
| 0:10:22 | and according to previous works | 
|---|
| 0:10:24 | it seems that it's also | 
|---|
| 0:10:25 | i don't stick to depend on i mean | 
|---|
| 0:10:26 | if we train the transformation network | 
|---|
| 0:10:28 | with | 
|---|
| 0:10:29 | i | 
|---|
| 0:10:30 | a speaker independent network behind and it changed is a speaker independent network | 
|---|
| 0:10:34 | that instead of having one hidden linux that's too | 
|---|
| 0:10:36 | it doesn't works anymore | 
|---|
| 0:10:38 | so | 
|---|
| 0:10:38 | uh | 
|---|
| 0:10:39 | it has some kind of | 
|---|
| 0:10:40 | the pendant | 
|---|
| 0:10:41 | of the detector | 
|---|
| 0:10:42 | uh | 
|---|
| 0:10:43 | well we have trained | 
|---|
| 0:10:44 | it's withstand the the marketing from that to say | 
|---|
| 0:10:46 | um with a diagonal buttons vintage metrics | 
|---|
| 0:10:49 | and | 
|---|
| 0:10:50 | when we use implemented of the same speaker | 
|---|
| 0:10:53 | uh what we | 
|---|
| 0:10:54 | falcon beginning is that uh | 
|---|
| 0:10:57 | it could | 
|---|
| 0:10:57 | hopefully if we send the differences we continue a speaker and | 
|---|
| 0:11:00 | and so model and that | 
|---|
| 0:11:02 | well i thought that would be useful for speaker identification | 
|---|
| 0:11:06 | so | 
|---|
| 0:11:06 | there so i stuck exactly the features | 
|---|
| 0:11:09 | i'd in the phonetic alignment with a nice | 
|---|
| 0:11:12 | the stations | 
|---|
| 0:11:13 | and | 
|---|
| 0:11:14 | train a speaker additions estimation for every segment | 
|---|
| 0:11:17 | and it's um | 
|---|
| 0:11:19 | a special things that they do is to remove | 
|---|
| 0:11:21 | long | 
|---|
| 0:11:22 | segments of silence to to avoid background and channel effect | 
|---|
| 0:11:25 | in the resulting features | 
|---|
| 0:11:27 | and i then just thinking of 'cause what edition that that this | 
|---|
| 0:11:30 | that is usually don't in the market | 
|---|
| 0:11:31 | in mlp training | 
|---|
| 0:11:33 | i just | 
|---|
| 0:11:34 | a place um | 
|---|
| 0:11:35 | fix the number five books and | 
|---|
| 0:11:36 | already said that this was that | 
|---|
| 0:11:38 | base and already sticks that they | 
|---|
| 0:11:39 | from the | 
|---|
| 0:11:40 | what what | 
|---|
| 0:11:41 | uh i don't that you think is that instead of | 
|---|
| 0:11:44 | training | 
|---|
| 0:11:45 | a full matrix | 
|---|
| 0:11:47 | uh | 
|---|
| 0:11:47 | and fully mean | 
|---|
| 0:11:49 | the input usually a fireman mlp it's composed of | 
|---|
| 0:11:51 | by the frame | 
|---|
| 0:11:52 | but the current frame and and it's context | 
|---|
| 0:11:55 | uh | 
|---|
| 0:11:56 | and if | 
|---|
| 0:11:57 | this for the square matters would be | 
|---|
| 0:11:59 | and | 
|---|
| 0:11:59 | and feed the number of features and | 
|---|
| 0:12:02 | the shape of the context | 
|---|
| 0:12:04 | it said that the reason that i | 
|---|
| 0:12:05 | i | 
|---|
| 0:12:06 | train or | 
|---|
| 0:12:08 | right | 
|---|
| 0:12:08 | tie the network | 
|---|
| 0:12:09 | uh for each frame independently on its position the context | 
|---|
| 0:12:13 | so are reduce the size of the of the transformation this chi | 
|---|
| 0:12:17 | so networks also um attic | 
|---|
| 0:12:19 | and what is our intent to come between them okay | 
|---|
| 0:12:22 | and | 
|---|
| 0:12:23 | and in addition to that that the the the | 
|---|
| 0:12:25 | that the source and the word feature vector | 
|---|
| 0:12:28 | uh you also a stack the feature in the meeting | 
|---|
| 0:12:31 | the feature mean and variance | 
|---|
| 0:12:32 | because it is it is it | 
|---|
| 0:12:33 | uh it is very usual to | 
|---|
| 0:12:35 | to do mean um but it's not my decision to the | 
|---|
| 0:12:37 | to the input of the mlp | 
|---|
| 0:12:39 | okay | 
|---|
| 0:12:40 | and i do this for | 
|---|
| 0:12:41 | for the difference thing to have | 
|---|
| 0:12:43 | the plp that could be with that's the modulation spectrum and at sea | 
|---|
| 0:12:49 | and for modelling i use support vector machines | 
|---|
| 0:12:52 | i i think that the speaker my uh | 
|---|
| 0:12:54 | feature vector and uh and i said above are impostor | 
|---|
| 0:12:57 | said | 
|---|
| 0:12:58 | used as negative examples | 
|---|
| 0:13:00 | i use the lips of them | 
|---|
| 0:13:02 | with linear kernel and ideas uh i mean that's almost stationary | 
|---|
| 0:13:05 | oh the input in the front seat one | 
|---|
| 0:13:09 | so | 
|---|
| 0:13:10 | let's go to the sperry meant | 
|---|
| 0:13:11 | um | 
|---|
| 0:13:13 | it's it's a i use the estimated as an extra to show three only the telltale condition | 
|---|
| 0:13:19 | uh | 
|---|
| 0:13:19 | i used to come to stiff | 
|---|
| 0:13:21 | systems | 
|---|
| 0:13:21 | to verify the usefulness or not | 
|---|
| 0:13:24 | oh this approach | 
|---|
| 0:13:25 | uh | 
|---|
| 0:13:26 | uh quite simple gmm ubm | 
|---|
| 0:13:29 | uh based on | 
|---|
| 0:13:32 | based on the features | 
|---|
| 0:13:33 | i i remove | 
|---|
| 0:13:35 | nonspeech frames or look at | 
|---|
| 0:13:37 | no one or two frames based on | 
|---|
| 0:13:38 | i well trained as business be it | 
|---|
| 0:13:40 | and uh | 
|---|
| 0:13:42 | i mean why becomes an alignment of the log energy | 
|---|
| 0:13:45 | uh i did that so i mean embodiments a shot and well | 
|---|
| 0:13:49 | typical things in | 
|---|
| 0:13:50 | ubm | 
|---|
| 0:13:52 | this is the set of the to use from previous | 
|---|
| 0:13:54 | a summary of relations | 
|---|
| 0:13:56 | i also play uh the normal score | 
|---|
| 0:13:58 | lemma session | 
|---|
| 0:14:00 | it | 
|---|
| 0:14:00 | and in addition to that uh can persist it compresses | 
|---|
| 0:14:04 | system | 
|---|
| 0:14:05 | i L C is uh | 
|---|
| 0:14:07 | a supervector are | 
|---|
| 0:14:08 | system that the quality of the S B svm | 
|---|
| 0:14:11 | and for the uh | 
|---|
| 0:14:13 | for the negative said it's i i i did read that the | 
|---|
| 0:14:15 | the supervectors from from this speaker models and | 
|---|
| 0:14:18 | i'm for the | 
|---|
| 0:14:20 | for the battery use data from the previous | 
|---|
| 0:14:22 | sorry | 
|---|
| 0:14:23 | S R I evaluations | 
|---|
| 0:14:25 | and i didn't apply score normalisation because they didn't | 
|---|
| 0:14:27 | uh see | 
|---|
| 0:14:28 | much improvement probably | 
|---|
| 0:14:30 | fig so there's some kind of problem might | 
|---|
| 0:14:32 | in my configuration i'm a conclusion | 
|---|
| 0:14:36 | uh i did calibration function and uh gender dependent is in the the toolkit | 
|---|
| 0:14:41 | by an equal | 
|---|
| 0:14:42 | to gain and it in two steps these | 
|---|
| 0:14:45 | this has gotten | 
|---|
| 0:14:47 | for every single system | 
|---|
| 0:14:49 | and later on i did | 
|---|
| 0:14:50 | other linear logistic regression | 
|---|
| 0:14:52 | and in case of | 
|---|
| 0:14:53 | uh yeah | 
|---|
| 0:14:54 | doing function of more than one system it's not it at this is that | 
|---|
| 0:14:57 | okay | 
|---|
| 0:14:58 | uh and i i did pay for every focus validation | 
|---|
| 0:15:02 | in the same evaluation set | 
|---|
| 0:15:04 | so | 
|---|
| 0:15:04 | what | 
|---|
| 0:15:08 | i i didn't think carole double colouration because uh | 
|---|
| 0:15:11 | it's what the recognition some set for calibration forty one right | 
|---|
| 0:15:14 | so | 
|---|
| 0:15:15 | and here we have already some results | 
|---|
| 0:15:18 | you can see that that works | 
|---|
| 0:15:19 | in blue | 
|---|
| 0:15:21 | over the course of the individual | 
|---|
| 0:15:23 | transformation network C stands | 
|---|
| 0:15:25 | based on different features plp but uh | 
|---|
| 0:15:27 | well listen a spectrogram and nancy | 
|---|
| 0:15:29 | yeah you have | 
|---|
| 0:15:30 | the mean detection cost function um | 
|---|
| 0:15:32 | point | 
|---|
| 0:15:34 | supplied by | 
|---|
| 0:15:35 | i i i i for me to say that it's the cost | 
|---|
| 0:15:37 | i use the cost of the sre propose an eight | 
|---|
| 0:15:39 | not the new all the | 
|---|
| 0:15:41 | two thousand nine | 
|---|
| 0:15:42 | yeah | 
|---|
| 0:15:43 | so | 
|---|
| 0:15:43 | and this is the the war right | 
|---|
| 0:15:46 | well the the the first thing that that that they want to make about this is that that | 
|---|
| 0:15:50 | well is not but it would but it worked | 
|---|
| 0:15:52 | and anyway i wasn't sure when a list of the to this | 
|---|
| 0:15:54 | and | 
|---|
| 0:15:56 | and with this but the individual systems | 
|---|
| 0:15:58 | uh we can see probably but the performance of that C | 
|---|
| 0:16:01 | the features but uh i don't have a big explanation probably | 
|---|
| 0:16:04 | because the feature | 
|---|
| 0:16:05 | sizes | 
|---|
| 0:16:06 | is bigger but the that i'm not sure or what simply because then that what is but | 
|---|
| 0:16:10 | it's over the classifier | 
|---|
| 0:16:12 | uh then i did | 
|---|
| 0:16:13 | some to other experiments that | 
|---|
| 0:16:15 | what's first try to fuse with audiologist information that the four individual systems | 
|---|
| 0:16:19 | or even better | 
|---|
| 0:16:21 | to try uh to concatenate that there | 
|---|
| 0:16:23 | the four | 
|---|
| 0:16:25 | david well features | 
|---|
| 0:16:27 | and to uh to train a single | 
|---|
| 0:16:29 | ordered in a single | 
|---|
| 0:16:30 | transformation of the feature vector | 
|---|
| 0:16:33 | what uh and we can see a nice improvement | 
|---|
| 0:16:35 | using the complete | 
|---|
| 0:16:36 | just wasn't at work | 
|---|
| 0:16:37 | feature vector | 
|---|
| 0:16:39 | um | 
|---|
| 0:16:43 | move to the next one | 
|---|
| 0:16:45 | this is that the that or | 
|---|
| 0:16:47 | comparing the different | 
|---|
| 0:16:48 | bayesian systems | 
|---|
| 0:16:50 | together with the new proposed in from the pacific on | 
|---|
| 0:16:52 | T N svm | 
|---|
| 0:16:54 | uh | 
|---|
| 0:16:56 | we can see with respect to the gmmubm | 
|---|
| 0:16:59 | um | 
|---|
| 0:17:01 | about their | 
|---|
| 0:17:03 | it performs better that close to the operation point | 
|---|
| 0:17:05 | but it seems that it goes it was | 
|---|
| 0:17:08 | words | 
|---|
| 0:17:09 | or a or a plus list items | 
|---|
| 0:17:11 | as long as we go closer to the whatever | 
|---|
| 0:17:13 | point | 
|---|
| 0:17:14 | and with us to the supervector uh | 
|---|
| 0:17:17 | we have a slightly worse performance in close to that | 
|---|
| 0:17:20 | the person point and and it works | 
|---|
| 0:17:23 | right | 
|---|
| 0:17:23 | words | 
|---|
| 0:17:24 | in the in the other | 
|---|
| 0:17:26 | in the other | 
|---|
| 0:17:26 | one to the the car | 
|---|
| 0:17:28 | thirteen point | 
|---|
| 0:17:29 | and we yeah | 
|---|
| 0:17:30 | what do think it's important from these results | 
|---|
| 0:17:32 | is that | 
|---|
| 0:17:33 | i can achieve more or less similar system | 
|---|
| 0:17:36 | some of the baseline systems by comparing to | 
|---|
| 0:17:38 | in some cases | 
|---|
| 0:17:39 | a bit worse in some cases a bit better but | 
|---|
| 0:17:42 | not politically different | 
|---|
| 0:17:45 | so | 
|---|
| 0:17:46 | the the the the the the final corpus of what's | 
|---|
| 0:17:48 | in fact | 
|---|
| 0:17:49 | trying to use it for for improving the the baseline systems | 
|---|
| 0:17:52 | and this is that the the the results show that the combination | 
|---|
| 0:17:56 | and you can see several | 
|---|
| 0:17:58 | different combinations | 
|---|
| 0:17:59 | these are the two baselines | 
|---|
| 0:18:01 | this is the minimum cost | 
|---|
| 0:18:03 | obtain deeper | 
|---|
| 0:18:04 | right | 
|---|
| 0:18:04 | and we can see that when we | 
|---|
| 0:18:06 | yeah | 
|---|
| 0:18:07 | we incorporated this formation of what features system | 
|---|
| 0:18:10 | we have | 
|---|
| 0:18:11 | some improvement | 
|---|
| 0:18:12 | probably | 
|---|
| 0:18:13 | uh | 
|---|
| 0:18:14 | it's um | 
|---|
| 0:18:19 | that that all the combinations here also | 
|---|
| 0:18:22 | so | 
|---|
| 0:18:24 | and i'm sure | 
|---|
| 0:18:25 | okay with that | 
|---|
| 0:18:28 | i mean | 
|---|
| 0:18:29 | yeah | 
|---|
| 0:18:30 | the conclusions | 
|---|
| 0:18:31 | uh | 
|---|
| 0:18:32 | what they combine | 
|---|
| 0:18:33 | in this work or what they want to do is to show | 
|---|
| 0:18:36 | that features that it | 
|---|
| 0:18:37 | from N in a a and then it to my meditation techniques | 
|---|
| 0:18:41 | can be used for speaker identification | 
|---|
| 0:18:43 | in a very similar way to | 
|---|
| 0:18:45 | how similar are | 
|---|
| 0:18:46 | is used for lotion systems | 
|---|
| 0:18:48 | uh i have used uh uh annotation technique | 
|---|
| 0:18:51 | technical information network | 
|---|
| 0:18:53 | and | 
|---|
| 0:18:56 | okay back to base | 
|---|
| 0:18:57 | on the recognition of this everlasting transforms | 
|---|
| 0:19:00 | and | 
|---|
| 0:19:01 | and the mean and variance of the input feature statistics uh should do to perform | 
|---|
| 0:19:06 | but well | 
|---|
| 0:19:07 | and with respect to the baseline | 
|---|
| 0:19:09 | we could see a relatively good performance | 
|---|
| 0:19:11 | so cases it in some operation points of | 
|---|
| 0:19:14 | all the the the | 
|---|
| 0:19:15 | cover it with | 
|---|
| 0:19:16 | it was more | 
|---|
| 0:19:16 | it was bad another | 
|---|
| 0:19:18 | it was worse but more or less | 
|---|
| 0:19:20 | uh | 
|---|
| 0:19:21 | similar performances | 
|---|
| 0:19:22 | and | 
|---|
| 0:19:23 | uh we could build five verify that it provides some | 
|---|
| 0:19:26 | complementary speaker | 
|---|
| 0:19:28 | choose for for for channel that we can have | 
|---|
| 0:19:30 | uh or baseline systems | 
|---|
| 0:19:33 | uh with respect to | 
|---|
| 0:19:34 | to carlisle and future work | 
|---|
| 0:19:36 | that we are going in a a | 
|---|
| 0:19:37 | or listen uh | 
|---|
| 0:19:39 | with these features | 
|---|
| 0:19:40 | um | 
|---|
| 0:19:41 | we need | 
|---|
| 0:19:42 | to assess a better than classified other we case our system because | 
|---|
| 0:19:46 | i would have very | 
|---|
| 0:19:47 | by a very low they were provided fact | 
|---|
| 0:19:50 | and | 
|---|
| 0:19:50 | well for discussion and i imagination also for | 
|---|
| 0:19:53 | for the station itself because uh | 
|---|
| 0:19:56 | probably with a better | 
|---|
| 0:19:57 | a speech recognition system would | 
|---|
| 0:19:59 | uh we'll have more meaningful features | 
|---|
| 0:20:02 | uh | 
|---|
| 0:20:03 | we we did almost all the tuning | 
|---|
| 0:20:06 | and another one to two characteristics uh | 
|---|
| 0:20:09 | base and all these things but that would probably | 
|---|
| 0:20:10 | should do something | 
|---|
| 0:20:12 | uh | 
|---|
| 0:20:13 | more and undertones and which is the relation between the the architecture of the speaker independent and never on the | 
|---|
| 0:20:18 | resulting features | 
|---|
| 0:20:20 | uh | 
|---|
| 0:20:20 | or even to mystical there | 
|---|
| 0:20:22 | adaptation method | 
|---|
| 0:20:23 | i do not try this | 
|---|
| 0:20:24 | adaptation of the output of the problem is | 
|---|
| 0:20:26 | um | 
|---|
| 0:20:28 | we have also i would have some of these things some us to meet at the | 
|---|
| 0:20:32 | it is a bit | 
|---|
| 0:20:33 | we can say | 
|---|
| 0:20:33 | and | 
|---|
| 0:20:35 | also uh apply in but everything compensation like now | 
|---|
| 0:20:39 | and so the nothing can really work in other things | 
|---|
| 0:20:41 | with interest in it | 
|---|
| 0:20:42 | and into the something similar to | 
|---|
| 0:20:44 | what is only in in people um for language identification letter | 
|---|
| 0:20:47 | that is | 
|---|
| 0:20:48 | use in | 
|---|
| 0:20:49 | several mlp | 
|---|
| 0:20:50 | uh networks from different languages and | 
|---|
| 0:20:53 | the reading this transformation networks for every of these languages without | 
|---|
| 0:20:56 | phonetic alignment and and then | 
|---|
| 0:20:58 | to get in a in a single feature vector and | 
|---|
| 0:21:01 | and this way of making the the the the approach | 
|---|
| 0:21:04 | uh not needed for the asr descriptions and finally | 
|---|
| 0:21:07 | making it also language independent | 
|---|
| 0:21:09 | and | 
|---|
| 0:21:11 | that's all | 
|---|
| 0:21:13 | okay | 
|---|
| 0:21:24 | okay | 
|---|
| 0:21:24 | questions | 
|---|
| 0:21:30 | actually | 
|---|
| 0:21:30 | chris slot | 
|---|
| 0:21:32 | uh | 
|---|
| 0:21:33 | no | 
|---|
| 0:21:34 | yeah i think it and number | 
|---|
| 0:21:36 | like i don't know | 
|---|
| 0:21:37 | number | 
|---|
| 0:21:38 | oh | 
|---|
| 0:21:42 | oh | 
|---|
| 0:21:47 | that one | 
|---|
| 0:21:48 | one | 
|---|
| 0:21:50 | okay some | 
|---|
| 0:21:51 | um | 
|---|
| 0:21:53 | lost | 
|---|
| 0:21:54 | um | 
|---|
| 0:21:55 | right | 
|---|
| 0:21:56 | a lot | 
|---|
| 0:21:57 | yeah | 
|---|
| 0:21:58 | um systems | 
|---|
| 0:21:59 | yeah | 
|---|
| 0:22:00 | yeah | 
|---|
| 0:22:01 | you know | 
|---|
| 0:22:02 | normalisation | 
|---|
| 0:22:03 | five | 
|---|
| 0:22:03 | so | 
|---|
| 0:22:04 | oh | 
|---|
| 0:22:05 | uh | 
|---|
| 0:22:07 | oh | 
|---|
| 0:22:09 | sure | 
|---|
| 0:22:11 | right | 
|---|
| 0:22:11 | um | 
|---|
| 0:22:12 | right | 
|---|
| 0:22:12 | normalisation | 
|---|
| 0:22:13 | no um | 
|---|
| 0:22:14 | i just the randomisation of the input of the svm | 
|---|
| 0:22:18 | modelling | 
|---|
| 0:22:19 | uh i didn't do | 
|---|
| 0:22:20 | a modelling uh also when i was doing testing but i i didn't do | 
|---|
| 0:22:24 | any other normalisation to the | 
|---|
| 0:22:26 | feature vectors | 
|---|
| 0:22:27 | in the rents either one | 
|---|
| 0:22:28 | i think it's conditional and | 
|---|
| 0:22:30 | in this | 
|---|
| 0:22:31 | uh support vector machine approaches | 
|---|
| 0:22:33 | just one | 
|---|
| 0:22:34 | some | 
|---|
| 0:22:34 | features | 
|---|
| 0:22:35 | just | 
|---|
| 0:22:36 | no | 
|---|
| 0:22:37 | no | 
|---|
| 0:22:37 | not not | 
|---|
| 0:22:38 | uh | 
|---|
| 0:22:40 | it's true | 
|---|
| 0:22:41 | yeah | 
|---|
| 0:22:42 | but | 
|---|
| 0:22:42 | well i and number should but it was between the the the svm | 
|---|
| 0:22:45 | we | 
|---|
| 0:22:46 | go with this i mean it will select | 
|---|
| 0:22:48 | this | 
|---|
| 0:22:49 | features that are more important i i mean | 
|---|
| 0:22:51 | i i didn't read in a different way that | 
|---|
| 0:22:53 | if you just coming from plp or | 
|---|
| 0:22:55 | i i just let the this ubm to learn | 
|---|
| 0:22:58 | what he thought it was better | 
|---|
| 0:23:00 | didn't do anything | 
|---|
| 0:23:02 | in this way | 
|---|
| 0:23:08 | uh | 
|---|
| 0:23:10 | this morning to mobilise the ubm it can be | 
|---|
| 0:23:15 | speaker not to to | 
|---|
| 0:23:18 | i'll close system model | 
|---|
| 0:23:21 | and if you use one of my | 
|---|
| 0:23:24 | oh | 
|---|
| 0:23:24 | oh yeah | 
|---|
| 0:23:26 | and to train neural network much more data than not | 
|---|
| 0:23:32 | one more thing | 
|---|
| 0:23:34 | and uh so why not too many | 
|---|
| 0:23:37 | rich but from the old people from one | 
|---|
| 0:23:41 | uh i i think it differs when you will | 
|---|
| 0:23:43 | should be because it and get it very well | 
|---|
| 0:23:45 | you're talking about | 
|---|
| 0:23:45 | way way idea and still with a random initialisation of the mlp network | 
|---|
| 0:23:50 | for training | 
|---|
| 0:23:52 | oh | 
|---|
| 0:23:53 | that was a layer of the moment | 
|---|
| 0:23:56 | the soft mask | 
|---|
| 0:23:57 | yeah | 
|---|
| 0:23:58 | yeah | 
|---|
| 0:23:59 | so why the force | 
|---|
| 0:24:00 | the one that works well mark | 
|---|
| 0:24:02 | just | 
|---|
| 0:24:03 | right | 
|---|
| 0:24:04 | you need only so much | 
|---|
| 0:24:06 | you can | 
|---|
| 0:24:08 | that | 
|---|
| 0:24:11 | uh | 
|---|
| 0:24:11 | the D C | 
|---|
| 0:24:15 | uh | 
|---|
| 0:24:17 | yeah | 
|---|
| 0:24:18 | i have | 
|---|
| 0:24:18 | a soft but | 
|---|
| 0:24:19 | max output here | 
|---|
| 0:24:21 | yeah | 
|---|
| 0:24:21 | and i don't have any other any other softmax output | 
|---|
| 0:24:24 | anyway this | 
|---|
| 0:24:25 | the linear input network and | 
|---|
| 0:24:27 | and i'm not i'm not doing any kind of non nonlinear in section at this point | 
|---|
| 0:24:32 | it's a | 
|---|
| 0:24:34 | am i think so | 
|---|
| 0:24:35 | but | 
|---|
| 0:24:35 | the | 
|---|
| 0:24:37 | no there's no | 
|---|
| 0:24:38 | nonlinearly stationary i dunno if an answering to you | 
|---|
| 0:24:43 | i | 
|---|
| 0:24:43 | oh | 
|---|
| 0:24:47 | no they didn't have or is it just it's a it's | 
|---|
| 0:24:50 | uh | 
|---|
| 0:24:51 | uh speech features | 
|---|
| 0:24:52 | yeah p2p or | 
|---|
| 0:24:54 | okay or | 
|---|
| 0:24:55 | and | 
|---|
| 0:24:56 | sorry and it's | 
|---|
| 0:24:57 | uh the | 
|---|
| 0:24:58 | the current frame and its context | 
|---|
| 0:25:00 | not only uh use anything context of | 
|---|
| 0:25:02 | there are two | 
|---|
| 0:25:04 | but uh | 
|---|
| 0:25:05 | but it's it's it's it's it's feature | 
|---|
| 0:25:11 | yeah | 
|---|
| 0:25:12 | so would you like to slide forty | 
|---|
| 0:25:19 | table | 
|---|
| 0:25:20 | um | 
|---|
| 0:25:21 | uh | 
|---|
| 0:25:23 | as a baseline | 
|---|
| 0:25:24 | but just like how much is it | 
|---|
| 0:25:26 | uh | 
|---|
| 0:25:27 | how many | 
|---|
| 0:25:27 | map estimation | 
|---|
| 0:25:29 | ah i did five and probably the support of okay relation | 
|---|
| 0:25:33 | um but i i think it was | 
|---|
| 0:25:35 | right | 
|---|
| 0:25:36 | this | 
|---|
| 0:25:36 | yeah | 
|---|
| 0:25:37 | yes i did five map iterations | 
|---|
| 0:25:39 | yes so i improprieties them but | 
|---|
| 0:25:40 | uh you you did | 
|---|
| 0:25:42 | five map map iterations before | 
|---|
| 0:25:44 | sitting | 
|---|
| 0:25:45 | do your is for him | 
|---|
| 0:25:47 | yeah | 
|---|
| 0:25:48 | so | 
|---|
| 0:25:48 | uh | 
|---|
| 0:25:49 | we found | 
|---|
| 0:25:50 | that | 
|---|
| 0:25:51 | yeah | 
|---|
| 0:25:52 | one | 
|---|
| 0:25:53 | yeah i i | 
|---|
| 0:25:54 | if only they | 
|---|
| 0:25:55 | right i i | 
|---|
| 0:25:57 | it's a | 
|---|
| 0:25:58 | to control | 
|---|
| 0:25:59 | yeah uh | 
|---|
| 0:25:59 | but but but we verified | 
|---|
| 0:26:01 | well | 
|---|
| 0:26:02 | i'm not completion but | 
|---|
| 0:26:03 | uh in that basic gmm ubm | 
|---|
| 0:26:05 | with five we got better even if we we go farther away | 
|---|
| 0:26:09 | we got | 
|---|
| 0:26:10 | uh | 
|---|
| 0:26:11 | a slight improvement | 
|---|
| 0:26:12 | but uh we and verified it when we moved to the supervector we do | 
|---|
| 0:26:16 | the so uh | 
|---|
| 0:26:17 | and and i realised i could do you want to david that it | 
|---|
| 0:26:20 | this was not a good idea | 
|---|
| 0:26:22 | probably | 
|---|
| 0:26:22 | well that's | 
|---|
| 0:26:23 | probably um | 
|---|
| 0:26:25 | with the but the configuration i would have | 
|---|
| 0:26:27 | uh | 
|---|
| 0:26:28 | okay | 
|---|
| 0:26:28 | i'm sure | 
|---|
| 0:26:29 | but the performance and as a purveyor in the supervector | 
|---|
| 0:26:32 | system | 
|---|
| 0:26:33 | sure i i realise that | 
|---|
| 0:26:35 | fig | 
|---|
| 0:26:51 | oh | 
|---|
| 0:26:51 | on the loss | 
|---|
| 0:26:53 | right | 
|---|
| 0:26:56 | X | 
|---|
| 0:26:57 | yeah | 
|---|
| 0:26:57 | okay | 
|---|
| 0:26:58 | oh | 
|---|
| 0:26:59 | sure | 
|---|
| 0:26:59 | uh_huh | 
|---|
| 0:27:00 | you | 
|---|
| 0:27:01 | yeah | 
|---|
| 0:27:01 | so | 
|---|
| 0:27:02 | how much | 
|---|
| 0:27:04 | oh | 
|---|
| 0:27:05 | hmmm | 
|---|
| 0:27:08 | oh | 
|---|
| 0:27:08 | and | 
|---|
| 0:27:09 | well | 
|---|
| 0:27:09 | it improves | 
|---|
| 0:27:10 | right | 
|---|
| 0:27:11 | the | 
|---|
| 0:27:12 | yeah | 
|---|
| 0:27:12 | no no i i like the way it was a | 
|---|
| 0:27:14 | uh too much but probably there is | 
|---|
| 0:27:16 | oh | 
|---|
| 0:27:16 | fig configuration problems because they see that people | 
|---|
| 0:27:19 | i get | 
|---|
| 0:27:20 | very nice improvement with | 
|---|
| 0:27:21 | no | 
|---|
| 0:27:22 | i don't know if it's because they'll tell only | 
|---|
| 0:27:24 | uh that the prove it | 
|---|
| 0:27:26 | yeah they get the improvement it's not | 
|---|
| 0:27:28 | so let's say i tried with | 
|---|
| 0:27:30 | uh | 
|---|
| 0:27:30 | different dimensionalities | 
|---|
| 0:27:32 | and the | 
|---|
| 0:27:34 | it improves | 
|---|
| 0:27:35 | yeah | 
|---|
| 0:27:35 | but the but it was not moving from | 
|---|
| 0:27:37 | i don't know how much i had here | 
|---|
| 0:27:39 | it wasn't moving from | 
|---|
| 0:27:42 | the six point fifty nine to three | 
|---|
| 0:27:45 | it was less | 
|---|
| 0:27:46 | and that | 
|---|
| 0:27:47 | um | 
|---|
| 0:27:48 | it is part of one | 
|---|
| 0:27:49 | um | 
|---|
| 0:27:50 | i just one | 
|---|
| 0:27:52 | which | 
|---|
| 0:27:53 | uh sport | 
|---|
| 0:27:54 | not | 
|---|
| 0:27:55 | uh | 
|---|
| 0:27:56 | um | 
|---|
| 0:27:57 | one | 
|---|
| 0:27:57 | you | 
|---|
| 0:27:59 | oh | 
|---|
| 0:27:59 | oh | 
|---|
| 0:28:00 | yeah | 
|---|
| 0:28:01 | yeah | 
|---|
| 0:28:02 | hmmm | 
|---|
| 0:28:03 | we live | 
|---|
| 0:28:04 | just | 
|---|
| 0:28:04 | straight on it | 
|---|
| 0:28:05 | no | 
|---|
| 0:28:06 | what | 
|---|
| 0:28:07 | she | 
|---|
| 0:28:08 | um | 
|---|
| 0:28:09 | no | 
|---|
| 0:28:10 | yeah | 
|---|
| 0:28:11 | one | 
|---|
| 0:28:12 | um | 
|---|
| 0:28:14 | oh | 
|---|
| 0:28:14 | oh | 
|---|
| 0:28:15 | oops | 
|---|
| 0:28:15 | yeah | 
|---|
| 0:28:17 | right | 
|---|
| 0:28:18 | or a | 
|---|
| 0:28:19 | hmmm | 
|---|
| 0:28:19 | oh | 
|---|
| 0:28:20 | sure | 
|---|
| 0:28:21 | also | 
|---|
| 0:28:22 | school | 
|---|
| 0:28:23 | hmmm | 
|---|
| 0:28:25 | oh | 
|---|
| 0:28:25 | i i i if it is a did something not right would be because i didn't various incidents | 
|---|
| 0:28:30 | okay fine | 
|---|
| 0:28:31 | and | 
|---|
| 0:28:31 | um i'm not sure | 
|---|
| 0:28:33 | i'm not sure i'm just currently live | 
|---|
| 0:28:35 | uh svm because probably in | 
|---|
| 0:28:37 | i think it was using that probably estimation it beeps and it's not a good idea | 
|---|
| 0:28:42 | but then using | 
|---|
| 0:28:42 | in both | 
|---|
| 0:28:43 | and both systems based on svm i mean | 
|---|
| 0:28:46 | and using also in a my proposal so | 
|---|
| 0:28:48 | i think i i can improve in that way the noise | 
|---|
| 0:28:51 | more was what you were meant in | 
|---|
| 0:28:52 | because and and are doing the | 
|---|
| 0:28:54 | and and and are doing the the this kind of problem with the background using that | 
|---|
| 0:28:58 | the to the the as the sum | 
|---|
| 0:29:00 | as being pretty and i think that | 
|---|
| 0:29:02 | the prediction the problem | 
|---|
| 0:29:03 | prediction is not that would the score for the | 
|---|
| 0:29:05 | for the speaker identification | 
|---|
| 0:29:07 | thing | 
|---|
| 0:29:08 | but | 
|---|
| 0:29:09 | well | 
|---|
| 0:29:11 | okay | 
|---|