| 0:00:07 | our idea | 
|---|
| 0:00:08 | i am | 
|---|
| 0:00:08 | writing saving | 
|---|
| 0:00:10 | uh | 
|---|
| 0:00:11 | what about um | 
|---|
| 0:00:12 | research uh directions that uh we are working in university of is there a lot | 
|---|
| 0:00:17 | is concentrated on uh | 
|---|
| 0:00:19 | uh new features for | 
|---|
| 0:00:21 | uh speaker recognition | 
|---|
| 0:00:23 | uh | 
|---|
| 0:00:23 | it could be i have a | 
|---|
| 0:00:25 | line | 
|---|
| 0:00:25 | three in our laboratory that we are working on um | 
|---|
| 0:00:28 | oh | 
|---|
| 0:00:29 | let me say | 
|---|
| 0:00:29 | i'm a little meeting on and working on it | 
|---|
| 0:00:32 | and new features exploring area of new features for | 
|---|
| 0:00:35 | the speaker recognition | 
|---|
| 0:00:37 | now the current work is the application of uh | 
|---|
| 0:00:41 | some sort of features name uh | 
|---|
| 0:00:44 | uh weighted linear prediction features for | 
|---|
| 0:00:46 | the speaker recognition | 
|---|
| 0:00:48 | and uh our work | 
|---|
| 0:00:50 | is | 
|---|
| 0:00:50 | yeah done jointly bar | 
|---|
| 0:00:52 | a group of how holding university of helsinki that nowadays the following a | 
|---|
| 0:00:57 | all all the universe | 
|---|
| 0:00:59 | let me just say that | 
|---|
| 0:01:01 | uh i'm not pretending that i know | 
|---|
| 0:01:03 | what is happening inside the | 
|---|
| 0:01:06 | uh is a weighted linear prediction i'm just | 
|---|
| 0:01:09 | presenting on this up my understanding that | 
|---|
| 0:01:12 | what is weighted linear prediction | 
|---|
| 0:01:15 | and | 
|---|
| 0:01:15 | uh from the group of older is here i just have to me | 
|---|
| 0:01:19 | to help me if | 
|---|
| 0:01:20 | i cannot describe something to you | 
|---|
| 0:01:23 | then | 
|---|
| 0:01:24 | so the concept | 
|---|
| 0:01:26 | we have | 
|---|
| 0:01:27 | sometimes the the customers or reckons that the users of | 
|---|
| 0:01:30 | speaker recognition technology that | 
|---|
| 0:01:32 | they want to use | 
|---|
| 0:01:33 | speaker recognition when they are when when they are | 
|---|
| 0:01:36 | outside of | 
|---|
| 0:01:37 | environment | 
|---|
| 0:01:38 | but we also have some sort of other users of a speaker recognition technology that they want to use it | 
|---|
| 0:01:44 | in any environment in high energy | 
|---|
| 0:01:46 | noise environment in the street | 
|---|
| 0:01:48 | or like | 
|---|
| 0:01:48 | fig track | 
|---|
| 0:01:49 | noise whatever | 
|---|
| 0:01:50 | they are not in office environment | 
|---|
| 0:01:52 | or the way to control | 
|---|
| 0:01:54 | uh and wonderful | 
|---|
| 0:01:55 | speech record | 
|---|
| 0:01:56 | then we are interested you know | 
|---|
| 0:01:58 | how our speaker recognition systems | 
|---|
| 0:02:00 | could it | 
|---|
| 0:02:01 | the degrade in performance | 
|---|
| 0:02:03 | but having this type of | 
|---|
| 0:02:04 | additive noise | 
|---|
| 0:02:08 | well | 
|---|
| 0:02:08 | just to describe what is our for | 
|---|
| 0:02:11 | since record here is the | 
|---|
| 0:02:13 | we are uh i think that uh | 
|---|
| 0:02:15 | uh a typical speaker recognition system but different | 
|---|
| 0:02:18 | phases and different modules | 
|---|
| 0:02:20 | but our for | 
|---|
| 0:02:22 | in this society is to | 
|---|
| 0:02:23 | see that | 
|---|
| 0:02:24 | how feature extraction | 
|---|
| 0:02:25 | could affect | 
|---|
| 0:02:26 | speaker recognition | 
|---|
| 0:02:28 | all of the speaker recognition performance | 
|---|
| 0:02:32 | how typically we are being uh | 
|---|
| 0:02:34 | feature extraction | 
|---|
| 0:02:36 | is that | 
|---|
| 0:02:36 | we have window frames | 
|---|
| 0:02:38 | we do is it from estimation | 
|---|
| 0:02:40 | having the mfccs duress the | 
|---|
| 0:02:42 | filtering | 
|---|
| 0:02:43 | appending delta double that of | 
|---|
| 0:02:45 | frame dropping according to energy | 
|---|
| 0:02:47 | and uh cepstral mean and variance normalisation this is something typical weekly | 
|---|
| 0:02:52 | thirty six dimensional feature vector that we have in our experiments but | 
|---|
| 0:02:56 | uh this is just based on problems | 
|---|
| 0:02:57 | is that we have | 
|---|
| 0:02:59 | then | 
|---|
| 0:03:00 | now the question is that | 
|---|
| 0:03:01 | is really | 
|---|
| 0:03:03 | uh we are all the time using F | 
|---|
| 0:03:05 | p2p | 
|---|
| 0:03:05 | to make the spectrum but if it is really | 
|---|
| 0:03:08 | uh the best way | 
|---|
| 0:03:09 | that we can do it | 
|---|
| 0:03:10 | or | 
|---|
| 0:03:11 | another question is that | 
|---|
| 0:03:12 | is it really that much for robust in additive noise condition | 
|---|
| 0:03:18 | that are going to the L P | 
|---|
| 0:03:19 | it is | 
|---|
| 0:03:20 | something uh | 
|---|
| 0:03:21 | well known that | 
|---|
| 0:03:22 | uh estimating the spectrum could be done by | 
|---|
| 0:03:25 | linear prediction | 
|---|
| 0:03:27 | or if if the estimation | 
|---|
| 0:03:28 | and they are uh | 
|---|
| 0:03:30 | fig just alternate model alternate the way to estimate the spectrum | 
|---|
| 0:03:34 | nobody save uh that | 
|---|
| 0:03:36 | L P is better for speaker recognition or if | 
|---|
| 0:03:39 | debaters metaphor | 
|---|
| 0:03:40 | the speaker recognition or even any | 
|---|
| 0:03:42 | other | 
|---|
| 0:03:42 | the speech processing applications | 
|---|
| 0:03:45 | and that | 
|---|
| 0:03:47 | now | 
|---|
| 0:03:47 | we are trying | 
|---|
| 0:03:48 | the two | 
|---|
| 0:03:49 | uh say that | 
|---|
| 0:03:51 | what is the performance of fft | 
|---|
| 0:03:54 | L P | 
|---|
| 0:03:54 | and now introducing we L P | 
|---|
| 0:03:57 | the V L P | 
|---|
| 0:03:58 | it's uh | 
|---|
| 0:03:59 | just | 
|---|
| 0:04:00 | targeted to pay more stress | 
|---|
| 0:04:03 | and | 
|---|
| 0:04:03 | some regions that | 
|---|
| 0:04:04 | speech that | 
|---|
| 0:04:05 | uh do you have | 
|---|
| 0:04:07 | let me say uh they have | 
|---|
| 0:04:09 | more energy | 
|---|
| 0:04:10 | yeah we have | 
|---|
| 0:04:12 | uh | 
|---|
| 0:04:13 | uh the way the uh we are waiting there | 
|---|
| 0:04:15 | energy of error | 
|---|
| 0:04:16 | by a weighting function and where the weighting function comes from | 
|---|
| 0:04:20 | is that we are uh | 
|---|
| 0:04:22 | computing there | 
|---|
| 0:04:28 | yeah we are | 
|---|
| 0:04:30 | we are computing the | 
|---|
| 0:04:32 | uh the weighting function as that | 
|---|
| 0:04:34 | the immediate energy of the signal | 
|---|
| 0:04:37 | before that | 
|---|
| 0:04:38 | current sample something like in samples before the current sample | 
|---|
| 0:04:41 | and put it and | 
|---|
| 0:04:43 | weighting function | 
|---|
| 0:04:44 | where we are estimating interrupted | 
|---|
| 0:04:46 | for example based on the previous at | 
|---|
| 0:04:49 | in this way | 
|---|
| 0:04:50 | it's possible | 
|---|
| 0:04:52 | yeah again set the derivatives of that | 
|---|
| 0:04:54 | wait | 
|---|
| 0:04:54 | echo or uh with respect | 
|---|
| 0:04:56 | yeah | 
|---|
| 0:04:57 | estimator | 
|---|
| 0:04:57 | a chance to zero and | 
|---|
| 0:04:59 | at least two normal double curve | 
|---|
| 0:05:01 | uh decorations | 
|---|
| 0:05:02 | and fine | 
|---|
| 0:05:03 | the weights | 
|---|
| 0:05:04 | after predictor | 
|---|
| 0:05:05 | and uh | 
|---|
| 0:05:06 | it is | 
|---|
| 0:05:07 | maybe the history count tonight | 
|---|
| 0:05:09 | seventy five and after that | 
|---|
| 0:05:11 | again activated in nineteen ninety three | 
|---|
| 0:05:13 | that the weighted linear prediction | 
|---|
| 0:05:16 | but | 
|---|
| 0:05:17 | let's say | 
|---|
| 0:05:18 | why | 
|---|
| 0:05:19 | we are choosing the S E | 
|---|
| 0:05:21 | short time energy | 
|---|
| 0:05:22 | four weighting function of the V L | 
|---|
| 0:05:26 | it can be true that | 
|---|
| 0:05:27 | yeah regions | 
|---|
| 0:05:28 | speech that they are they have high energy | 
|---|
| 0:05:31 | they are less contaminated with additive noise | 
|---|
| 0:05:34 | and uh | 
|---|
| 0:05:35 | it is a | 
|---|
| 0:05:36 | something some uh some sort of | 
|---|
| 0:05:38 | five | 
|---|
| 0:05:39 | but it is known we can have | 
|---|
| 0:05:42 | it or estimation of the spectrum in the region that | 
|---|
| 0:05:44 | speech that they are | 
|---|
| 0:05:45 | less | 
|---|
| 0:05:46 | corrupted by noise | 
|---|
| 0:05:47 | and these regions that | 
|---|
| 0:05:48 | speech | 
|---|
| 0:05:49 | how | 
|---|
| 0:05:49 | higher | 
|---|
| 0:05:50 | short time | 
|---|
| 0:05:51 | energy | 
|---|
| 0:05:53 | it corresponds also | 
|---|
| 0:05:55 | to the region of the i mean | 
|---|
| 0:05:57 | when you're talking the regions of a speech that they are | 
|---|
| 0:05:59 | higher | 
|---|
| 0:06:00 | short time energy | 
|---|
| 0:06:02 | it also corresponds to the regions | 
|---|
| 0:06:04 | that | 
|---|
| 0:06:05 | uh our | 
|---|
| 0:06:06 | little hole | 
|---|
| 0:06:07 | a little | 
|---|
| 0:06:08 | and the | 
|---|
| 0:06:09 | yeah | 
|---|
| 0:06:09 | some local system it disconnected | 
|---|
| 0:06:11 | from this the speech production | 
|---|
| 0:06:13 | system | 
|---|
| 0:06:14 | and the | 
|---|
| 0:06:14 | in this case we have some standing wave inside our local calls | 
|---|
| 0:06:19 | where | 
|---|
| 0:06:19 | if we want to compute | 
|---|
| 0:06:21 | formance of | 
|---|
| 0:06:22 | speech signal | 
|---|
| 0:06:23 | we can have more prominent | 
|---|
| 0:06:25 | uh formant | 
|---|
| 0:06:26 | estimation | 
|---|
| 0:06:27 | of that | 
|---|
| 0:06:28 | speech signal | 
|---|
| 0:06:31 | well | 
|---|
| 0:06:32 | if | 
|---|
| 0:06:32 | now what is the problem with reality | 
|---|
| 0:06:35 | normal equation somehow gravity to lead | 
|---|
| 0:06:38 | two | 
|---|
| 0:06:39 | table filter when we are | 
|---|
| 0:06:41 | predicting the coefficients of the predictor | 
|---|
| 0:06:44 | now the problem with the L P that it is that correctly | 
|---|
| 0:06:46 | sure | 
|---|
| 0:06:47 | to lead to stable filter | 
|---|
| 0:06:49 | and this is a problem | 
|---|
| 0:06:50 | speech thing | 
|---|
| 0:06:50 | as for example | 
|---|
| 0:06:52 | oh how we can | 
|---|
| 0:06:53 | what we can do | 
|---|
| 0:06:55 | is that uh | 
|---|
| 0:06:57 | instead of using | 
|---|
| 0:06:58 | some sort of | 
|---|
| 0:06:59 | weighting function | 
|---|
| 0:07:00 | we can decompose into partial weights | 
|---|
| 0:07:02 | and a light | 
|---|
| 0:07:04 | in | 
|---|
| 0:07:04 | this way | 
|---|
| 0:07:05 | to the estimator | 
|---|
| 0:07:06 | after uh | 
|---|
| 0:07:08 | yeah | 
|---|
| 0:07:09 | current sample | 
|---|
| 0:07:10 | and | 
|---|
| 0:07:10 | in this way | 
|---|
| 0:07:11 | we can only | 
|---|
| 0:07:12 | to such equations | 
|---|
| 0:07:14 | that they are derived | 
|---|
| 0:07:15 | in the paper up to maggie | 
|---|
| 0:07:17 | and uh | 
|---|
| 0:07:19 | uh | 
|---|
| 0:07:20 | they describe | 
|---|
| 0:07:21 | the behaviour of the | 
|---|
| 0:07:23 | a total weight | 
|---|
| 0:07:24 | i mean these base | 
|---|
| 0:07:26 | in the way | 
|---|
| 0:07:27 | that the | 
|---|
| 0:07:28 | final estimator coefficients should be | 
|---|
| 0:07:31 | it should be in such a way that lead to the | 
|---|
| 0:07:34 | a stable filter | 
|---|
| 0:07:36 | well | 
|---|
| 0:07:37 | i'm not | 
|---|
| 0:07:38 | still | 
|---|
| 0:07:38 | understanding completely what's happening here but in this paper | 
|---|
| 0:07:42 | because we describe describe | 
|---|
| 0:07:44 | but for more different | 
|---|
| 0:07:46 | please | 
|---|
| 0:07:46 | you can refer to that | 
|---|
| 0:07:48 | paper | 
|---|
| 0:07:50 | well here | 
|---|
| 0:07:51 | i'm the reading of | 
|---|
| 0:07:52 | frame and | 
|---|
| 0:07:53 | i spectrum estimation of it | 
|---|
| 0:07:56 | voice | 
|---|
| 0:07:56 | right | 
|---|
| 0:07:57 | from these two thousand | 
|---|
| 0:07:59 | to uh sorry | 
|---|
| 0:08:01 | and the | 
|---|
| 0:08:02 | uh somehow | 
|---|
| 0:08:04 | the same frame | 
|---|
| 0:08:05 | that we contaminated with factory noise | 
|---|
| 0:08:07 | with your db snr | 
|---|
| 0:08:09 | it is | 
|---|
| 0:08:11 | let me think obvious that | 
|---|
| 0:08:12 | uh | 
|---|
| 0:08:13 | uh | 
|---|
| 0:08:14 | when we are doing the the | 
|---|
| 0:08:16 | uh spectrum estimation of the noise to signal | 
|---|
| 0:08:18 | there are | 
|---|
| 0:08:19 | some problems | 
|---|
| 0:08:20 | that | 
|---|
| 0:08:21 | it | 
|---|
| 0:08:21 | is mainly cool | 
|---|
| 0:08:22 | by | 
|---|
| 0:08:23 | the the the | 
|---|
| 0:08:25 | the noise signal and | 
|---|
| 0:08:26 | how it affects | 
|---|
| 0:08:27 | depends on the snr level it depends on the noise that is adjusted | 
|---|
| 0:08:31 | sample | 
|---|
| 0:08:32 | and the tequila just more intuition what is | 
|---|
| 0:08:36 | zero T V factory noise i have here | 
|---|
| 0:08:38 | yeah | 
|---|
| 0:08:39 | speech file just the | 
|---|
| 0:08:40 | P stuff | 
|---|
| 0:08:41 | speech files that | 
|---|
| 0:08:42 | we do all this | 
|---|
| 0:08:43 | frame | 
|---|
| 0:08:43 | from those people | 
|---|
| 0:08:44 | speech file | 
|---|
| 0:08:49 | a little | 
|---|
| 0:08:51 | it'll the other way | 
|---|
| 0:08:54 | we go real but i don't know what or something | 
|---|
| 0:08:58 | yeah it was a clean sample from these two thousand | 
|---|
| 0:09:01 | you | 
|---|
| 0:09:01 | test set | 
|---|
| 0:09:03 | yeah | 
|---|
| 0:09:04 | yeah | 
|---|
| 0:09:05 | yeah | 
|---|
| 0:09:05 | yeah | 
|---|
| 0:09:06 | the other way | 
|---|
| 0:09:07 | the remote | 
|---|
| 0:09:09 | really | 
|---|
| 0:09:10 | or | 
|---|
| 0:09:11 | yeah | 
|---|
| 0:09:12 | and | 
|---|
| 0:09:13 | same piece | 
|---|
| 0:09:13 | that we can can it be zero T V | 
|---|
| 0:09:16 | additive noise | 
|---|
| 0:09:17 | well factor | 
|---|
| 0:09:19 | well | 
|---|
| 0:09:20 | no it shows that | 
|---|
| 0:09:21 | what are what is really | 
|---|
| 0:09:23 | the mean by zero D B | 
|---|
| 0:09:25 | snr | 
|---|
| 0:09:26 | yeah | 
|---|
| 0:09:27 | yeah | 
|---|
| 0:09:28 | yeah | 
|---|
| 0:09:28 | yeah | 
|---|
| 0:09:30 | connected to some results | 
|---|
| 0:09:31 | ah | 
|---|
| 0:09:32 | yeah | 
|---|
| 0:09:33 | let me think uh opted for | 
|---|
| 0:09:35 | spectrum estimation method | 
|---|
| 0:09:37 | that we are thinking about | 
|---|
| 0:09:38 | and used to come into | 
|---|
| 0:09:40 | corpus we had known or has some other type of | 
|---|
| 0:09:43 | speaker detection | 
|---|
| 0:09:44 | and using factory noise then | 
|---|
| 0:09:46 | the only be | 
|---|
| 0:09:48 | snr | 
|---|
| 0:09:49 | here we can see that | 
|---|
| 0:09:50 | the method mainly grouped into | 
|---|
| 0:09:53 | sure method | 
|---|
| 0:09:54 | after | 
|---|
| 0:09:54 | the N L P | 
|---|
| 0:09:55 | and let me see the weighted | 
|---|
| 0:09:57 | L P group | 
|---|
| 0:09:59 | plp itself | 
|---|
| 0:09:59 | and | 
|---|
| 0:10:00 | it's the L P | 
|---|
| 0:10:02 | yeah | 
|---|
| 0:10:03 | i i should mention that needs to go into | 
|---|
| 0:10:05 | it's a | 
|---|
| 0:10:07 | uh the database collected in uh | 
|---|
| 0:10:10 | um | 
|---|
| 0:10:11 | uh | 
|---|
| 0:10:12 | that | 
|---|
| 0:10:13 | mobile handsets mainly | 
|---|
| 0:10:15 | and it includes | 
|---|
| 0:10:16 | inside | 
|---|
| 0:10:17 | come with uh convolutional noise and some additive noise | 
|---|
| 0:10:20 | although we are i think i did too much white | 
|---|
| 0:10:23 | ourselves | 
|---|
| 0:10:25 | yeah | 
|---|
| 0:10:26 | we can | 
|---|
| 0:10:27 | see | 
|---|
| 0:10:27 | that that is really some difference between the performance of | 
|---|
| 0:10:31 | these feature | 
|---|
| 0:10:32 | in additive noise environment | 
|---|
| 0:10:36 | we don't try | 
|---|
| 0:10:37 | uh some | 
|---|
| 0:10:38 | just | 
|---|
| 0:10:38 | let me say one | 
|---|
| 0:10:39 | very famous | 
|---|
| 0:10:40 | a speech enhancement method | 
|---|
| 0:10:42 | and uh | 
|---|
| 0:10:43 | uh as it | 
|---|
| 0:10:44 | just some added to black | 
|---|
| 0:10:46 | in our feature extraction | 
|---|
| 0:10:47 | to see what | 
|---|
| 0:10:48 | really uh one simplicity | 
|---|
| 0:10:50 | speech enhanced | 
|---|
| 0:10:51 | method | 
|---|
| 0:10:52 | i have | 
|---|
| 0:10:53 | a speaker recognition system in additive noise N Y | 
|---|
| 0:10:56 | and | 
|---|
| 0:10:57 | looking at the results | 
|---|
| 0:10:58 | it shows that yes there is | 
|---|
| 0:11:00 | uh some | 
|---|
| 0:11:01 | good improvement | 
|---|
| 0:11:03 | based on | 
|---|
| 0:11:04 | having a speech | 
|---|
| 0:11:06 | and enhancement or latency spectrum | 
|---|
| 0:11:08 | yeah subtracting our | 
|---|
| 0:11:09 | them | 
|---|
| 0:11:10 | but | 
|---|
| 0:11:11 | uh these results | 
|---|
| 0:11:12 | although they are too much different but | 
|---|
| 0:11:14 | i should say that | 
|---|
| 0:11:15 | uh our | 
|---|
| 0:11:17 | uh | 
|---|
| 0:11:18 | noise | 
|---|
| 0:11:19 | it's | 
|---|
| 0:11:19 | stationary remote | 
|---|
| 0:11:20 | and uh and uh real work it is | 
|---|
| 0:11:23 | not really the case | 
|---|
| 0:11:26 | coming | 
|---|
| 0:11:27 | some | 
|---|
| 0:11:27 | more recent data that | 
|---|
| 0:11:29 | we were here | 
|---|
| 0:11:30 | see | 
|---|
| 0:11:30 | that | 
|---|
| 0:11:31 | if these results from this to tell them to generalise to nice two thousand | 
|---|
| 0:11:35 | eight and maybe need two thousand | 
|---|
| 0:11:37 | ten because | 
|---|
| 0:11:38 | we were one of the ladies that i for you for some should be nice | 
|---|
| 0:11:41 | two thousand | 
|---|
| 0:11:42 | ten sre and this was | 
|---|
| 0:11:44 | our | 
|---|
| 0:11:44 | based system i mean the contribution of our | 
|---|
| 0:11:47 | uh university of eastern finland was | 
|---|
| 0:11:49 | trying some | 
|---|
| 0:11:50 | new features | 
|---|
| 0:11:52 | and it's | 
|---|
| 0:11:52 | for speaker recognition | 
|---|
| 0:11:54 | looking at the results | 
|---|
| 0:11:56 | let me see | 
|---|
| 0:11:57 | just | 
|---|
| 0:11:58 | somehow | 
|---|
| 0:11:59 | how them | 
|---|
| 0:12:00 | group | 
|---|
| 0:12:02 | the system here is | 
|---|
| 0:12:03 | 'cause that's where we are with | 
|---|
| 0:12:05 | an A P | 
|---|
| 0:12:06 | and the condition is | 
|---|
| 0:12:07 | eight content second if you ask me why it contents that can be selected for | 
|---|
| 0:12:11 | evaluation 'cause i was working on a forecast for | 
|---|
| 0:12:14 | the speaker recognition and this was something | 
|---|
| 0:12:17 | well let me say | 
|---|
| 0:12:18 | somehow it has some metric nice | 
|---|
| 0:12:20 | how to and i selected here | 
|---|
| 0:12:22 | for the presentation but | 
|---|
| 0:12:23 | if uh | 
|---|
| 0:12:24 | we | 
|---|
| 0:12:24 | look at the other | 
|---|
| 0:12:26 | core test | 
|---|
| 0:12:26 | also | 
|---|
| 0:12:27 | they have this | 
|---|
| 0:12:28 | same | 
|---|
| 0:12:29 | uh interpretation | 
|---|
| 0:12:32 | looking at the results of any weed out any P | 
|---|
| 0:12:35 | it says that uh | 
|---|
| 0:12:36 | uh | 
|---|
| 0:12:37 | it's plp | 
|---|
| 0:12:38 | based results | 
|---|
| 0:12:39 | they are improving | 
|---|
| 0:12:41 | the det care | 
|---|
| 0:12:42 | in uh all that | 
|---|
| 0:12:44 | area if | 
|---|
| 0:12:44 | i | 
|---|
| 0:12:45 | i carried the results correct | 
|---|
| 0:12:47 | uh thing | 
|---|
| 0:12:48 | i mean dcf at whatever rate | 
|---|
| 0:12:51 | A S P L P is improving compared to | 
|---|
| 0:12:54 | yeah | 
|---|
| 0:12:55 | mfcc here directors are for | 
|---|
| 0:12:57 | uh many of the balloon are for females and the green one uh is for | 
|---|
| 0:13:03 | let me say | 
|---|
| 0:13:04 | all trials male and female | 
|---|
| 0:13:07 | coming to the results | 
|---|
| 0:13:09 | with any any | 
|---|
| 0:13:10 | the effect of using S P L P | 
|---|
| 0:13:13 | but to someone rotating the det curve in some sense because | 
|---|
| 0:13:17 | min dcf | 
|---|
| 0:13:17 | getting through to be but | 
|---|
| 0:13:19 | equal error rate | 
|---|
| 0:13:20 | get a bit worse | 
|---|
| 0:13:22 | but | 
|---|
| 0:13:22 | if | 
|---|
| 0:13:23 | uh | 
|---|
| 0:13:25 | you had | 
|---|
| 0:13:25 | why | 
|---|
| 0:13:26 | happening | 
|---|
| 0:13:27 | we have i have no idea right now | 
|---|
| 0:13:29 | we just applied | 
|---|
| 0:13:30 | live in this | 
|---|
| 0:13:31 | S P L P and | 
|---|
| 0:13:33 | uh we try | 
|---|
| 0:13:34 | time | 
|---|
| 0:13:34 | effect | 
|---|
| 0:13:35 | but | 
|---|
| 0:13:36 | coming to the interpretation that | 
|---|
| 0:13:37 | why it happens need more study on it | 
|---|
| 0:13:41 | well | 
|---|
| 0:13:43 | i think | 
|---|
| 0:13:44 | yes | 
|---|
| 0:13:45 | this was the point that i want to | 
|---|
| 0:13:46 | oh | 
|---|
| 0:13:47 | thank you | 
|---|
| 0:13:57 | okay questions we have the whole question could've | 
|---|
| 0:14:14 | just click less know that yes signal to noise ratio you use on the inside | 
|---|
| 0:14:19 | T | 
|---|
| 0:14:20 | no matter | 
|---|
| 0:14:21 | yeah and you also had yeah we'll deal | 
|---|
| 0:14:25 | in the you mentioned that and that you know performance was supported in the two hundred zero D B | 
|---|
| 0:14:32 | yes | 
|---|
| 0:14:33 | um | 
|---|
| 0:14:34 | my question is how did you miss european signal you know to noise ratio because one nine or you know | 
|---|
| 0:14:40 | what do you | 
|---|
| 0:14:42 | it sounded as yeah maybe it's me maybe | 
|---|
| 0:14:46 | he | 
|---|
| 0:14:46 | other people may not agree with the | 
|---|
| 0:14:48 | i thought i think that's the most signals so therefore maybe that's not zero D B maybe i did that | 
|---|
| 0:14:54 | idea | 
|---|
| 0:14:55 | women tend either minus ten | 
|---|
| 0:14:57 | higher | 
|---|
| 0:14:58 | i don't noise in it | 
|---|
| 0:14:59 | and then i mean uh yeah yeah uh i thought | 
|---|
| 0:15:02 | the editorial you display | 
|---|
| 0:15:05 | that you called not zero D B | 
|---|
| 0:15:07 | sounded is in | 
|---|
| 0:15:08 | the signal is only the stronger than uh you know zero D B situation | 
|---|
| 0:15:12 | well because i was suspected that somebody will ask how i'm like that with exactly the matlab code that you | 
|---|
| 0:15:17 | have to get yeah | 
|---|
| 0:15:18 | i uh i can interpret here that we are measuring the energy of the every frame that | 
|---|
| 0:15:23 | speech signal and averaging them | 
|---|
| 0:15:25 | or | 
|---|
| 0:15:26 | signal and over the noise and uh putting all that | 
|---|
| 0:15:30 | snr | 
|---|
| 0:15:32 | snr here | 
|---|
| 0:15:33 | yeah | 
|---|
| 0:15:34 | to to gain to have the game | 
|---|
| 0:15:35 | and then | 
|---|
| 0:15:36 | needs to all just signal together | 
|---|
| 0:15:38 | the noise | 
|---|
| 0:15:39 | and we'll get signal together with thinking that we have as | 
|---|
| 0:15:42 | average snr | 
|---|
| 0:15:45 | so | 
|---|
| 0:15:47 | you are meddling signal to noise | 
|---|
| 0:15:49 | yeah | 
|---|
| 0:15:49 | so by using that intense | 
|---|
| 0:15:52 | the | 
|---|
| 0:15:53 | uh rather than uh no i'm pretty you know the | 
|---|
| 0:15:57 | yeah yeah framing the signal and the measuring the energy of the | 
|---|
| 0:16:00 | uh | 
|---|
| 0:16:00 | frames | 
|---|
| 0:16:01 | and uh averaging the more | 
|---|
| 0:16:03 | signal | 
|---|
| 0:16:03 | and uh | 
|---|
| 0:16:04 | okay uh | 
|---|
| 0:16:05 | finding the | 
|---|
| 0:16:06 | relative gain between the noise and signal | 
|---|
| 0:16:24 | i don't see any difference in these | 
|---|
| 0:16:26 | ah | 
|---|
| 0:16:29 | what you cant difference you expect to see | 
|---|
| 0:16:31 | well that's noisy i expected the spectrum ooh | 
|---|
| 0:16:34 | these are | 
|---|
| 0:16:35 | flat and then filled in | 
|---|
| 0:16:37 | noise | 
|---|
| 0:16:38 | i mean this is a | 
|---|
| 0:16:39 | this looks | 
|---|
| 0:16:39 | because | 
|---|
| 0:16:41 | yeah this is depends on the noise | 
|---|
| 0:16:43 | because this is fact | 
|---|
| 0:16:44 | just the the noise that we use here | 
|---|
| 0:16:47 | it just factory noise | 
|---|
| 0:16:48 | i | 
|---|
| 0:16:48 | these | 
|---|
| 0:16:48 | right | 
|---|
| 0:16:49 | just | 
|---|
| 0:16:50 | uh i had these | 
|---|
| 0:16:50 | type of behaviour we just selected one right | 
|---|
| 0:16:53 | the effect of noise is not the same for all frames maybe you're right because | 
|---|
| 0:16:57 | the I S P X | 
|---|
| 0:16:58 | right | 
|---|
| 0:16:58 | but i think that by increasing the noise on the noise level of the spectrum | 
|---|
| 0:17:02 | uh it's flat and more flat and we are losing the information | 
|---|
| 0:17:06 | in the spectrum but just some typical example to show | 
|---|
| 0:17:10 | how it works | 
|---|
| 0:17:28 | the other questions | 
|---|
| 0:17:30 | two questions | 
|---|
| 0:17:31 | we have it or not | 
|---|
| 0:17:32 | but may get one more interpretation that | 
|---|
| 0:17:35 | we use this as the L E in conjunction with mfcc add other features | 
|---|
| 0:17:40 | and uh i for you separation | 
|---|
| 0:17:42 | is that we | 
|---|
| 0:17:43 | right somehow evaluated our system | 
|---|
| 0:17:45 | or just yeah he uh | 
|---|
| 0:17:47 | feature | 
|---|
| 0:17:48 | and then | 
|---|
| 0:17:48 | i mean | 
|---|
| 0:17:49 | score four | 
|---|
| 0:17:50 | so | 
|---|
| 0:17:50 | subsystem | 
|---|
| 0:17:51 | they use the other side | 
|---|
| 0:17:53 | sensing i for you and taking | 
|---|
| 0:17:55 | uh uh let me say | 
|---|
| 0:17:57 | uh | 
|---|
| 0:17:58 | using uh me | 
|---|
| 0:17:59 | having | 
|---|
| 0:18:00 | this type of | 
|---|
| 0:18:01 | them that they are | 
|---|
| 0:18:02 | uh ultra wide | 
|---|
| 0:18:04 | beat | 
|---|
| 0:18:05 | S A P | 
|---|
| 0:18:06 | a different type of | 
|---|
| 0:18:07 | score | 
|---|
| 0:18:08 | speaker | 
|---|
| 0:18:10 | or just | 
|---|
| 0:18:11 | one of the assumptions | 
|---|
| 0:18:12 | um | 
|---|
| 0:18:13 | for for your model is that you | 
|---|
| 0:18:15 | more energy | 
|---|
| 0:18:17 | um observations in the signal | 
|---|
| 0:18:19 | so in the most reliable right | 
|---|
| 0:18:21 | that's right we have a good because that has the energy of the noise increasing | 
|---|
| 0:18:25 | they could be a really just | 
|---|
| 0:18:27 | uh | 
|---|
| 0:18:28 | uh coloured by the noise | 
|---|
| 0:18:30 | okay i just | 
|---|
| 0:18:31 | the the other side of the uh the the body is also | 
|---|
| 0:18:35 | uh if you don't um | 
|---|
| 0:18:37 | uh | 
|---|
| 0:18:37 | the situation where you getting distortions because | 
|---|
| 0:18:40 | uh | 
|---|
| 0:18:40 | or are driving the channel for example | 
|---|
| 0:18:43 | um and it may be the case where the signal is actually one time | 
|---|
| 0:18:48 | and then you | 
|---|
| 0:18:49 | the | 
|---|
| 0:18:49 | could be | 
|---|
| 0:18:50 | um but maybe another | 
|---|
| 0:18:53 | indicated | 
|---|
| 0:18:53 | silver jews | 
|---|
| 0:18:54 | the work | 
|---|
| 0:18:54 | syllable are energy | 
|---|
| 0:18:56 | um observations | 
|---|
| 0:18:58 | well in this case you're right uh | 
|---|
| 0:19:00 | we don't know exactly what will happen if signal is to be | 
|---|
| 0:19:03 | by by channel by recording device or | 
|---|
| 0:19:07 | what about this | 
|---|
| 0:19:08 | formance us are just somehow done with the uh | 
|---|
| 0:19:12 | sounds and that | 
|---|
| 0:19:13 | uh | 
|---|
| 0:19:13 | all the signal exactly but if you ask me what will happen if all the signals here | 
|---|
| 0:19:18 | i will say that | 
|---|
| 0:19:19 | uh i think | 
|---|
| 0:19:20 | after the spectral L P spectrum that all | 
|---|
| 0:19:23 | fig | 
|---|
| 0:19:23 | the same way that we hope U S we'll get the fate | 
|---|
| 0:19:28 | really thank you very much | 
|---|