| 0:00:15 | i | 
|---|
| 0:00:20 | she had your dark suit in greasy wash more | 
|---|
| 0:00:25 | zero such and number eight this type of features for speaker recognition we've got five | 
|---|
| 0:00:31 | papers | 
|---|
| 0:00:32 | there will be presented in the session | 
|---|
| 0:00:34 | there is a bit of time yeah | 
|---|
| 0:00:37 | before the process | 
|---|
| 0:00:38 | process will come for a this evening's event my | 
|---|
| 0:00:44 | we can actually now you're a little bit afterwards | 
|---|
| 0:00:46 | discussion | 
|---|
| 0:00:48 | sure the first error talk is a feature extraction using two younger regression models for | 
|---|
| 0:00:54 | speaker recognition a johns hopkins group | 
|---|
| 0:00:57 | rescue representing the paper | 
|---|
| 0:01:02 | oh | 
|---|
| 0:01:04 | i think that you want to ask once you got constraints also i'm just sitting | 
|---|
| 0:01:08 | here watching | 
|---|
| 0:01:10 | so that neither idea of the last but not what they are | 
|---|
| 0:01:15 | is that i want to use this but also for S L T V | 
|---|
| 0:01:20 | possible to some discussion about features in general for speaker recognition | 
|---|
| 0:01:24 | because i think we started yesterday and i was to realise that we have some | 
|---|
| 0:01:29 | issues just like the mean | 
|---|
| 0:01:31 | so i have a few slides at the beginning which perhaps | 
|---|
| 0:01:34 | it would be more general and what i want to talk about that like | 
|---|
| 0:01:39 | and he'll be back again is it | 
|---|
| 0:01:43 | i always like if you have a questions during the presentation | 
|---|
| 0:01:47 | these past me immediately don't feel | 
|---|
| 0:01:50 | but i mean if we don't this work the slides | 
|---|
| 0:01:53 | is that receives as everybody's here and you don't know what i'm talking | 
|---|
| 0:01:58 | so just keep asking questions or something | 
|---|
| 0:02:02 | so the business a is following | 
|---|
| 0:02:07 | we have a speech | 
|---|
| 0:02:09 | speech information so he streams | 
|---|
| 0:02:14 | that is a speaker by | 
|---|
| 0:02:18 | that is | 
|---|
| 0:02:19 | probably should environment | 
|---|
| 0:02:21 | and this information | 
|---|
| 0:02:23 | and this is | 
|---|
| 0:02:25 | speech | 
|---|
| 0:02:27 | a right be pretty | 
|---|
| 0:02:29 | all of us | 
|---|
| 0:02:33 | one of them | 
|---|
| 0:02:35 | so these are really | 
|---|
| 0:02:37 | if you are not speaker a special case i environment or you message | 
|---|
| 0:02:44 | can be used or a | 
|---|
| 0:02:47 | and | 
|---|
| 0:02:47 | speaker | 
|---|
| 0:02:49 | oh | 
|---|
| 0:02:50 | speaker is easy job is the number of things | 
|---|
| 0:02:54 | which may consider as a disturbing as audio | 
|---|
| 0:02:59 | one i saw the speaker has a speaker agents only can do not annoying | 
|---|
| 0:03:07 | source audio video information if you would like to be invariant | 
|---|
| 0:03:15 | a nice piece of a signal and there is this information | 
|---|
| 0:03:21 | right | 
|---|
| 0:03:22 | you | 
|---|
| 0:03:23 | and balls | 
|---|
| 0:03:25 | analysis of features | 
|---|
| 0:03:27 | and the classifier | 
|---|
| 0:03:30 | yeah analysis that would you stop which we know | 
|---|
| 0:03:35 | before | 
|---|
| 0:03:37 | we see that they | 
|---|
| 0:03:38 | this is based on what everyone in school or whatever you got from previous experience | 
|---|
| 0:03:44 | with the day | 
|---|
| 0:03:46 | and then there is a classifier and classifier was typically train | 
|---|
| 0:03:50 | now is the distinction you another classification is somehow coming because we then train feature | 
|---|
| 0:03:58 | extraction | 
|---|
| 0:03:59 | so | 
|---|
| 0:04:01 | that is | 
|---|
| 0:04:03 | some you know like | 
|---|
| 0:04:05 | but the so as i said and this is what exactly what is what we | 
|---|
| 0:04:10 | tried that before | 
|---|
| 0:04:12 | also | 
|---|
| 0:04:14 | outcome of this whole process should be in our identity of the speaker | 
|---|
| 0:04:19 | right so the eagles this process | 
|---|
| 0:04:23 | somehow alleviating sources | 
|---|
| 0:04:26 | on what information | 
|---|
| 0:04:28 | yeah | 
|---|
| 0:04:29 | stressy information about the speaker so you would like to see you analysis | 
|---|
| 0:04:34 | which somehow suppresses to answer | 
|---|
| 0:04:37 | the message will influence of the environment and so on and hence is the information | 
|---|
| 0:04:43 | about who is speaking | 
|---|
| 0:04:46 | but of course time constant you also over years in speech research | 
|---|
| 0:04:52 | that is very often better | 
|---|
| 0:04:54 | what is used as much as possible to use something which is wrong because what | 
|---|
| 0:05:00 | you have either you can get | 
|---|
| 0:05:03 | and so on | 
|---|
| 0:05:05 | yeah | 
|---|
| 0:05:05 | that in speech recognition here with the same way as we end up in the | 
|---|
| 0:05:12 | speech recognition | 
|---|
| 0:05:13 | i well i know how to do this process speech that is | 
|---|
| 0:05:19 | you take the signal and there's some frequency axis is not clear so you get | 
|---|
| 0:05:26 | a sequence of each of the each maybe describe the signal in different frequency sub-bands | 
|---|
| 0:05:30 | you're at five | 
|---|
| 0:05:32 | ignore to face there and you want to find it somehow you know in quotes | 
|---|
| 0:05:38 | because people that hearing is to some extent first thing and the press | 
|---|
| 0:05:42 | and this properties might be might be useful some properties | 
|---|
| 0:05:48 | and | 
|---|
| 0:05:50 | so we don't signal so analysis is very high | 
|---|
| 0:05:57 | right here | 
|---|
| 0:05:59 | so this typeface that's | 
|---|
| 0:06:02 | the money | 
|---|
| 0:06:03 | and then some modifications to depending on the school of thought of that | 
|---|
| 0:06:08 | the be seen before the vacations | 
|---|
| 0:06:11 | plp people different modifications the mfcc people and so on it's own there is a | 
|---|
| 0:06:18 | people | 
|---|
| 0:06:19 | and | 
|---|
| 0:06:22 | yeah we take it cosine transform a few cases because people | 
|---|
| 0:06:27 | the features used modifications most likely there is some compression type of a room that | 
|---|
| 0:06:32 | something happens here are also transformed here approximately the correlation features | 
|---|
| 0:06:37 | and you get the cepstrum | 
|---|
| 0:06:39 | and cepstrum is what we use any using | 
|---|
| 0:06:42 | both in speech and speaker position and all these are the worst | 
|---|
| 0:06:47 | so that's because | 
|---|
| 0:06:50 | we should be all representation | 
|---|
| 0:06:53 | as the speaker recognition people i'm not world from speech recognition you are you this | 
|---|
| 0:06:59 | if i give different whole speech recognition people also oral presentation from speech coding people | 
|---|
| 0:07:06 | and so on and so basically he was then | 
|---|
| 0:07:08 | on the shoulders times | 
|---|
| 0:07:11 | right so much as i mentioned briefly at the work site you | 
|---|
| 0:07:17 | i | 
|---|
| 0:07:18 | yeah data is actually a slight | 
|---|
| 0:07:24 | online | 
|---|
| 0:07:25 | so what was the sources of a body at that time | 
|---|
| 0:07:31 | so that the source a different channels | 
|---|
| 0:07:35 | what you | 
|---|
| 0:07:36 | right most in one | 
|---|
| 0:07:39 | interspeech case | 
|---|
| 0:07:40 | so we use a set of points | 
|---|
| 0:07:42 | about the speech sound | 
|---|
| 0:07:44 | that's why we shouldn't | 
|---|
| 0:07:48 | what conditions | 
|---|
| 0:07:50 | you | 
|---|
| 0:07:51 | this information which is of course | 
|---|
| 0:07:56 | the design | 
|---|
| 0:07:58 | the most suitable | 
|---|
| 0:08:00 | function | 
|---|
| 0:08:01 | and so forth | 
|---|
| 0:08:03 | this is something you | 
|---|
| 0:08:05 | of course is you don't live which i or typically work will be first thing | 
|---|
| 0:08:11 | this | 
|---|
| 0:08:12 | the but you just changing channels down | 
|---|
| 0:08:17 | a lot of course also the goal | 
|---|
| 0:08:20 | and high basically space exposed | 
|---|
| 0:08:25 | so this is the formation | 
|---|
| 0:08:27 | which i feel E you will be are not speak | 
|---|
| 0:08:32 | yeah so yeah pretty funny "'cause" it's a little late and of course | 
|---|
| 0:08:38 | say | 
|---|
| 0:08:40 | briefly speaker a techniques like a background what | 
|---|
| 0:08:46 | joint factor analysis and so one image that i speakers | 
|---|
| 0:08:50 | in some cases embarrassingly well i zero | 
|---|
| 0:08:56 | doesn't exist by from the G | 
|---|
| 0:08:59 | so | 
|---|
| 0:09:01 | probably doesn't is not sure that | 
|---|
| 0:09:08 | i | 
|---|
| 0:09:11 | now let's see how much this machinery minutes i mean from the from these days | 
|---|
| 0:09:17 | i | 
|---|
| 0:09:20 | so | 
|---|
| 0:09:22 | i | 
|---|
| 0:09:35 | i | 
|---|
| 0:09:40 | i | 
|---|
| 0:09:49 | all | 
|---|
| 0:09:53 | this is so this is like i think that | 
|---|
| 0:09:57 | yeah exactly | 
|---|
| 0:10:01 | my | 
|---|
| 0:10:03 | you know this is a spectrum so it's not accustomed as the suspect | 
|---|
| 0:10:07 | so that is because we copied some as well as far as a very fast | 
|---|
| 0:10:15 | where | 
|---|
| 0:10:16 | yeah | 
|---|
| 0:10:19 | firstly | 
|---|
| 0:10:20 | yeah that's | 
|---|
| 0:10:22 | such in the break that it might be worthwhile looking back into these | 
|---|
| 0:10:28 | the basic analysis | 
|---|
| 0:10:29 | because we have a data much more data and a very fancy processing techniques may | 
|---|
| 0:10:36 | wants to know how much | 
|---|
| 0:10:38 | variability yeah exactly how much variability | 
|---|
| 0:10:43 | i | 
|---|
| 0:10:44 | i | 
|---|
| 0:10:54 | i | 
|---|
| 0:10:55 | i | 
|---|
| 0:11:03 | so | 
|---|
| 0:11:10 | i | 
|---|
| 0:11:11 | i | 
|---|
| 0:11:15 | yes | 
|---|
| 0:11:17 | and the techniques which you can be physical for recognizing speaker actually very much bigger | 
|---|
| 0:11:23 | than | 
|---|
| 0:11:24 | that is | 
|---|
| 0:11:26 | maybe | 
|---|
| 0:11:27 | is it is misleading because we use | 
|---|
| 0:11:31 | when you | 
|---|
| 0:11:32 | speaker dependent on | 
|---|
| 0:11:34 | i | 
|---|
| 0:11:36 | yeah | 
|---|
| 0:11:36 | i | 
|---|
| 0:11:39 | and | 
|---|
| 0:11:40 | what are you want | 
|---|
| 0:11:41 | ask | 
|---|
| 0:11:42 | or maybe sets it they pay cisco phase right | 
|---|
| 0:11:47 | somebody | 
|---|
| 0:11:49 | and | 
|---|
| 0:11:51 | but the same decide it is this work | 
|---|
| 0:11:54 | the work on i don't sources and methods applied | 
|---|
| 0:11:59 | speech on this is people might be more specific for speaker recognition | 
|---|
| 0:12:04 | but this would be another story so results | 
|---|
| 0:12:07 | this | 
|---|
| 0:12:08 | i talk about are based on deriving spectrum are focused on | 
|---|
| 0:12:15 | normally the signal people you know a second time | 
|---|
| 0:12:22 | and after some preprocessing fine autoregressive model i mean like | 
|---|
| 0:12:28 | and what we what log spectral line spectrum and we for a and a | 
|---|
| 0:12:35 | and a spectral | 
|---|
| 0:12:39 | spectrum | 
|---|
| 0:12:41 | right | 
|---|
| 0:12:42 | the sequence is the functional a | 
|---|
| 0:12:44 | you can also differently in this to help with this | 
|---|
| 0:12:49 | where presenting here | 
|---|
| 0:12:51 | if you think it will sometimes long signal | 
|---|
| 0:12:55 | in do exactly the same thing | 
|---|
| 0:12:58 | and you between those on your on your cosine | 
|---|
| 0:13:02 | so then you want to be able to derive the model and in this particular | 
|---|
| 0:13:09 | frequency that | 
|---|
| 0:13:12 | but by wideband and you end up which is time-frequency nation | 
|---|
| 0:13:18 | just like you for this is that you know i sometimes like all overlay this | 
|---|
| 0:13:25 | is this is a very rich people whose second level or when they do this | 
|---|
| 0:13:30 | i | 
|---|
| 0:13:30 | spectral | 
|---|
| 0:13:32 | and this is maybe more weight to each hearing is working because i don't see | 
|---|
| 0:13:37 | that | 
|---|
| 0:13:38 | and second of speech and speaker | 
|---|
| 0:13:41 | what frequency components and the most | 
|---|
| 0:13:45 | then the second so this is the way you have a what is important for | 
|---|
| 0:13:50 | you to somehow get some system | 
|---|
| 0:13:53 | the global this way | 
|---|
| 0:13:55 | start | 
|---|
| 0:13:56 | this | 
|---|
| 0:13:57 | well i enough not be possible at you know which we can see if i | 
|---|
| 0:14:02 | was | 
|---|
| 0:14:03 | which one | 
|---|
| 0:14:05 | if you just look at the picture might believe me | 
|---|
| 0:14:08 | okay | 
|---|
| 0:14:09 | yeah | 
|---|
| 0:14:10 | so this is what we all frequency domain linear prediction of these gonna fight students | 
|---|
| 0:14:16 | recording three | 
|---|
| 0:14:16 | as you don't prediction | 
|---|
| 0:14:18 | oh that's a perceptual linear prediction so this can be side i | 
|---|
| 0:14:24 | but i think it is a quite a bit of perceptions | 
|---|
| 0:14:28 | it's | 
|---|
| 0:14:30 | as the | 
|---|
| 0:14:31 | so here is one seven | 
|---|
| 0:14:34 | we have a signal | 
|---|
| 0:14:36 | yeah you have a basal | 
|---|
| 0:14:38 | finally all of this model | 
|---|
| 0:14:41 | oh | 
|---|
| 0:14:43 | and you also otherwise | 
|---|
| 0:14:45 | see what is left after | 
|---|
| 0:14:48 | is that | 
|---|
| 0:14:50 | and not really different frequency bands that i | 
|---|
| 0:14:54 | this time domain signal are bands | 
|---|
| 0:14:59 | different frequency band you can be some is for the channel over there | 
|---|
| 0:15:05 | so the resynthesized speech from adults only one can also synthesized speech from them | 
|---|
| 0:15:12 | yeah | 
|---|
| 0:15:16 | so if you | 
|---|
| 0:15:18 | the signal | 
|---|
| 0:15:21 | oh | 
|---|
| 0:15:22 | oh search | 
|---|
| 0:15:24 | i | 
|---|
| 0:15:24 | i table you | 
|---|
| 0:15:30 | yeah | 
|---|
| 0:15:32 | i | 
|---|
| 0:15:34 | oh | 
|---|
| 0:15:38 | oh search | 
|---|
| 0:15:41 | i just don't | 
|---|
| 0:15:46 | yeah | 
|---|
| 0:15:49 | and | 
|---|
| 0:15:55 | i | 
|---|
| 0:15:59 | if you where | 
|---|
| 0:16:02 | well i | 
|---|
| 0:16:04 | i | 
|---|
| 0:16:08 | i | 
|---|
| 0:16:09 | that is to send messages because then | 
|---|
| 0:16:14 | thus | 
|---|
| 0:16:15 | speech | 
|---|
| 0:16:17 | but bottom line here is that | 
|---|
| 0:16:19 | what we should not be used for speaker a single be this way | 
|---|
| 0:16:24 | oh | 
|---|
| 0:16:24 | i | 
|---|
| 0:16:25 | a four or is that actually you know | 
|---|
| 0:16:29 | in some ways | 
|---|
| 0:16:31 | one is some there is a whole | 
|---|
| 0:16:36 | components | 
|---|
| 0:16:37 | yeah | 
|---|
| 0:16:39 | formation | 
|---|
| 0:16:40 | well | 
|---|
| 0:16:41 | also | 
|---|
| 0:16:43 | shen | 
|---|
| 0:16:43 | for | 
|---|
| 0:16:44 | speech | 
|---|
| 0:16:50 | here is that since a young | 
|---|
| 0:16:53 | a simple and here is that you get a sound | 
|---|
| 0:16:59 | robustness so you know it's | 
|---|
| 0:17:03 | and you have a representation | 
|---|
| 0:17:06 | yeah | 
|---|
| 0:17:07 | in | 
|---|
| 0:17:07 | so as well as you have some problem here | 
|---|
| 0:17:13 | give some more | 
|---|
| 0:17:15 | high energy possible and we can see | 
|---|
| 0:17:20 | oh is assumed | 
|---|
| 0:17:25 | so | 
|---|
| 0:17:26 | i mentioned in | 
|---|
| 0:17:30 | so | 
|---|
| 0:17:31 | well as a whole | 
|---|
| 0:17:33 | which i | 
|---|
| 0:17:35 | since so you'll find the right | 
|---|
| 0:17:38 | as i | 
|---|
| 0:17:40 | so if you before | 
|---|
| 0:17:41 | or if you | 
|---|
| 0:17:43 | yeah | 
|---|
| 0:17:45 | different S is divided by S | 
|---|
| 0:17:49 | and this is just a to see this somehow | 
|---|
| 0:17:54 | that's easily different frequencies | 
|---|
| 0:17:57 | depending on the frequency | 
|---|
| 0:17:59 | well | 
|---|
| 0:18:00 | channel and you can you like | 
|---|
| 0:18:03 | i one of the suspect | 
|---|
| 0:18:07 | to see what is just a way | 
|---|
| 0:18:11 | or gain of the older this | 
|---|
| 0:18:14 | and that's what we foresee essentially you just ignored | 
|---|
| 0:18:19 | in this new | 
|---|
| 0:18:20 | so | 
|---|
| 0:18:22 | thus | 
|---|
| 0:18:23 | well you right side or depending on | 
|---|
| 0:18:27 | oh | 
|---|
| 0:18:28 | oh by the | 
|---|
| 0:18:31 | the signal is you and i think this task to say | 
|---|
| 0:18:36 | then | 
|---|
| 0:18:37 | so i eight | 
|---|
| 0:18:41 | also | 
|---|
| 0:18:42 | oh or similar | 
|---|
| 0:18:44 | you | 
|---|
| 0:18:46 | more | 
|---|
| 0:18:46 | more robust in presence of an average | 
|---|
| 0:18:49 | noise that's right | 
|---|
| 0:18:53 | is just a mess | 
|---|
| 0:18:55 | well | 
|---|
| 0:18:56 | then | 
|---|
| 0:18:58 | i | 
|---|
| 0:19:00 | so basically we so people | 
|---|
| 0:19:05 | if you look at more than me importance | 
|---|
| 0:19:11 | well | 
|---|
| 0:19:14 | and | 
|---|
| 0:19:15 | so how many that is more | 
|---|
| 0:19:19 | first thing is that | 
|---|
| 0:19:21 | speech | 
|---|
| 0:19:22 | you | 
|---|
| 0:19:23 | and | 
|---|
| 0:19:24 | these be different frequency ranges | 
|---|
| 0:19:28 | E | 
|---|
| 0:19:29 | try to find | 
|---|
| 0:19:32 | i don't know | 
|---|
| 0:19:34 | well | 
|---|
| 0:19:36 | and also different | 
|---|
| 0:19:39 | this is a state | 
|---|
| 0:19:40 | and then they want to be able to use the one speaker recognition techniques which | 
|---|
| 0:19:47 | you | 
|---|
| 0:19:48 | friends don't know and so on | 
|---|
| 0:19:50 | but then we then we just | 
|---|
| 0:19:52 | or something which is small | 
|---|
| 0:19:54 | that's cs for significantly | 
|---|
| 0:19:57 | this way they expect to take a frequency | 
|---|
| 0:20:02 | respect or | 
|---|
| 0:20:04 | and five respect to see that all over | 
|---|
| 0:20:09 | oh | 
|---|
| 0:20:09 | yeah and then so you do this | 
|---|
| 0:20:13 | at a time | 
|---|
| 0:20:14 | this time frequencies | 
|---|
| 0:20:17 | i | 
|---|
| 0:20:17 | this is me | 
|---|
| 0:20:21 | here we already removed | 
|---|
| 0:20:23 | okay some | 
|---|
| 0:20:28 | you | 
|---|
| 0:20:31 | that is | 
|---|
| 0:20:32 | then | 
|---|
| 0:20:33 | yeah | 
|---|
| 0:20:34 | yeah | 
|---|
| 0:20:36 | she is much longer | 
|---|
| 0:20:40 | responsible rule | 
|---|
| 0:20:42 | very short | 
|---|
| 0:20:46 | the communication | 
|---|
| 0:20:48 | so it's yeah | 
|---|
| 0:20:51 | oh that's out | 
|---|
| 0:20:53 | i | 
|---|
| 0:20:53 | style | 
|---|
| 0:20:57 | which i theses so you know | 
|---|
| 0:21:01 | so our | 
|---|
| 0:21:02 | first | 
|---|
| 0:21:08 | yeah i | 
|---|
| 0:21:11 | yeah i think that both my main street | 
|---|
| 0:21:14 | one | 
|---|
| 0:21:16 | yeah | 
|---|
| 0:21:18 | and we also | 
|---|
| 0:21:20 | a false one | 
|---|
| 0:21:31 | i | 
|---|
| 0:21:33 | i | 
|---|
| 0:21:34 | performance | 
|---|
| 0:21:38 | and | 
|---|
| 0:21:39 | oh | 
|---|
| 0:21:41 | this is | 
|---|
| 0:21:42 | both | 
|---|
| 0:21:44 | i | 
|---|
| 0:21:56 | so | 
|---|
| 0:22:01 | again | 
|---|
| 0:22:02 | right | 
|---|
| 0:22:07 | yeah | 
|---|
| 0:22:10 | right | 
|---|
| 0:22:12 | this | 
|---|
| 0:22:14 | so | 
|---|
| 0:22:16 | oh | 
|---|
| 0:22:18 | i | 
|---|
| 0:22:23 | i | 
|---|
| 0:22:26 | i know i was also | 
|---|
| 0:22:30 | i have some | 
|---|
| 0:22:32 | the task | 
|---|
| 0:22:36 | yeah i | 
|---|
| 0:22:38 | same i | 
|---|
| 0:22:39 | i | 
|---|
| 0:22:42 | but that's a | 
|---|
| 0:22:44 | you know | 
|---|
| 0:22:47 | i | 
|---|
| 0:22:48 | yeah | 
|---|
| 0:22:53 | oh | 
|---|
| 0:22:55 | you | 
|---|
| 0:22:56 | right | 
|---|
| 0:23:00 | well | 
|---|
| 0:23:01 | i | 
|---|
| 0:23:04 | yeah | 
|---|
| 0:23:04 | i | 
|---|
| 0:23:08 | oh | 
|---|
| 0:23:09 | oh | 
|---|
| 0:23:13 | i | 
|---|
| 0:23:14 | i | 
|---|
| 0:23:15 | and | 
|---|
| 0:23:16 | well | 
|---|
| 0:23:16 | where | 
|---|
| 0:23:18 | oh | 
|---|
| 0:23:19 | yeah | 
|---|
| 0:23:20 | i can't | 
|---|
| 0:23:22 | yeah | 
|---|
| 0:23:23 | i | 
|---|
| 0:23:24 | i was hoping that are supposed to be | 
|---|
| 0:23:28 | based | 
|---|
| 0:23:30 | oh yeah probably get a degree without so maybe somebody | 
|---|
| 0:23:36 | i think this there's function is expressed here | 
|---|
| 0:23:40 | but at the same time | 
|---|
| 0:23:42 | features and classifier based or a classifier for speaker recognition results speaker | 
|---|
| 0:23:49 | use all the knowledge data used bandage fact they can tell you for different areas | 
|---|
| 0:23:55 | of speech sounds | 
|---|
| 0:23:57 | by different parts of the model and so on and so on | 
|---|
| 0:24:00 | interesting | 
|---|
| 0:24:02 | how to realise doesn't take advantage of that somebody was pointing out what | 
|---|
| 0:24:09 | so yeah that is | 
|---|
| 0:24:12 | i | 
|---|
| 0:24:21 | oh | 
|---|
| 0:24:22 | oh | 
|---|
| 0:24:25 | oh | 
|---|
| 0:24:27 | i | 
|---|
| 0:24:30 | i | 
|---|
| 0:24:32 | i | 
|---|
| 0:24:48 | no | 
|---|
| 0:24:51 | it's such that every utterance is about a sentence we just take over the whole | 
|---|
| 0:24:57 | utterance | 
|---|
| 0:24:58 | if we have a lot of speech we could be chopped into segments i one | 
|---|
| 0:25:03 | five seconds and then be on the length of the segment be always choose the | 
|---|
| 0:25:08 | what the model about how their second right hand we expect second | 
|---|
| 0:25:14 | so what the country that segment of the signal | 
|---|
| 0:25:19 | by to be if you if you don't signal just you mean and the iced | 
|---|
| 0:25:23 | i typically the first | 
|---|
| 0:25:26 | very personal data to model doesn't lid can check | 
|---|
| 0:25:31 | so we use the central file | 
|---|
| 0:25:35 | this is | 
|---|
| 0:25:37 | i | 
|---|
| 0:25:40 | i | 
|---|
| 0:25:48 | vol | 
|---|
| 0:25:51 | i didn't say exactly | 
|---|
| 0:25:54 | what i said that what might be interesting for speaker is to use there is | 
|---|
| 0:25:57 | you run | 
|---|
| 0:25:59 | this process which is like you all pole zero signal in different component and yeah | 
|---|
| 0:26:05 | it was the one which was used here | 
|---|
| 0:26:09 | component was the war which was god | 
|---|
| 0:26:11 | but i have been like what i write | 
|---|
| 0:26:14 | was that it sounded like a global K but information about this about this message | 
|---|
| 0:26:21 | a problem and she would have was | 
|---|
| 0:26:25 | just | 
|---|
| 0:26:26 | information about some information about the speaker | 
|---|
| 0:26:30 | i don't think they and eighty or assigned to the original | 
|---|
| 0:26:34 | the original | 
|---|
| 0:26:36 | this other sort of T V this profile used as it is for speech recognition | 
|---|
| 0:26:41 | component is | 
|---|
| 0:26:43 | component so we just use it as a speech signal utterance | 
|---|
| 0:26:47 | our phoneme recognizers got getting what was it fifty five four percent | 
|---|
| 0:26:53 | fifty percent accuracy | 
|---|
| 0:26:55 | so you can understand the same machine that's the two | 
|---|
| 0:26:59 | a with respect to recognizing phonemes | 
|---|
| 0:27:02 | somebody i you know | 
|---|
| 0:27:17 | i mean happening at the top the loss and all four formants are gone | 
|---|
| 0:27:24 | and everything is one | 
|---|
| 0:27:26 | and it is | 
|---|
| 0:27:28 | it's a bit | 
|---|
| 0:27:31 | i way that you don't | 
|---|
| 0:27:32 | the | 
|---|
| 0:27:38 | the only assumption is not in use it is useful since out | 
|---|
| 0:27:43 | oh also somebody speaker | 
|---|
| 0:27:48 | i | 
|---|
| 0:27:49 | i | 
|---|
| 0:27:51 | oh of course i mean i see that course yeah so that might be right | 
|---|
| 0:27:58 | i | 
|---|
| 0:27:59 | a | 
|---|
| 0:28:02 | or | 
|---|
| 0:28:03 | i | 
|---|
| 0:28:10 | i | 
|---|
| 0:28:12 | of course we oh no i'm you know i again that they all cases you | 
|---|
| 0:28:17 | ask and you saw or fusion right all six together things i one side try | 
|---|
| 0:28:22 | to paper as a matter of fact it was of a speaker recognition | 
|---|
| 0:28:26 | which was called towards decreasing error rates | 
|---|
| 0:28:29 | and there's one of the reviewers if | 
|---|
| 0:28:32 | if she uses here and you feel that | 
|---|
| 0:28:35 | says he is not doesn't between | 
|---|
| 0:28:38 | the paper was rejected so i have a that are saying about you know if | 
|---|
| 0:28:43 | you are working on something you | 
|---|
| 0:28:46 | and of course if you use it on its own is very likely that you're | 
|---|
| 0:28:50 | performance | 
|---|
| 0:28:51 | the other | 
|---|
| 0:28:52 | degrees that's why neural paper to was increasing rate | 
|---|
| 0:28:56 | but now since we have these huge | 
|---|
| 0:28:58 | and that was fifteen years ago and you start working one fusions | 
|---|
| 0:29:02 | if you if you just the goals for different source of what you have a | 
|---|
| 0:29:07 | different source of information you have very like to the improvement after you you'll see | 
|---|
| 0:29:13 | that that's why should research when you things | 
|---|
| 0:29:18 | of the diffusion you are very unlikely degrees error rates will be all right i'm | 
|---|
| 0:29:22 | you want to do something you it doesn't work and you put your what's that | 
|---|
| 0:29:26 | works | 
|---|
| 0:29:27 | and you can present at the conference | 
|---|
| 0:29:31 | seven | 
|---|
| 0:29:33 | others | 
|---|