| 0:00:14 | and only one |
|---|
| 0:00:15 | i am whether a student formula can |
|---|
| 0:00:18 | one and one causing a lot of a banana split in previous to deal was |
|---|
| 0:00:22 | interested recognition is performed with a probabilistic verification |
|---|
| 0:00:28 | so yes ones everybody's causing you was having we present the motivation |
|---|
| 0:00:33 | there have been introduced in this is hypothesis |
|---|
| 0:00:36 | after that i will talk about a constant residual coefficient |
|---|
| 0:00:39 | then i describe what is one percent |
|---|
| 0:00:43 | and the discipline analysis of the i'm about this program or resolution and baseline completely |
|---|
| 0:00:52 | and i don't think will addition model and i'm in every and example |
|---|
| 0:01:00 | so next time maybe more relation between the thing dysphonia to be found okay |
|---|
| 0:01:05 | there is a really useful was not i'll find out what i also be included |
|---|
| 0:01:09 | by different from confirmation it means |
|---|
| 0:01:12 | what i had to use |
|---|
| 0:01:15 | for the one of my speech |
|---|
| 0:01:17 | so we don't |
|---|
| 0:01:19 | and is the more |
|---|
| 0:01:21 | i per cent signal or this might help us to discriminate |
|---|
| 0:01:26 | the more not quite useful for speech |
|---|
| 0:01:28 | and those kind of understanding |
|---|
| 0:01:31 | i was to design a better and more reliable numbers of detection |
|---|
| 0:01:37 | we get it was taken motivation and he went down with |
|---|
| 0:01:41 | a visual within need to different |
|---|
| 0:01:43 | a front end performance on is rather than continuing this so it can see everything |
|---|
| 0:01:49 | considered |
|---|
| 0:01:51 | mean speaker-specinc which is especially more effective in detecting as an |
|---|
| 0:01:55 | when we listen to it in a less effective in detecting the one okay just |
|---|
| 0:02:00 | is eight |
|---|
| 0:02:01 | you money or |
|---|
| 0:02:04 | sequences as well collect is equal error |
|---|
| 0:02:07 | there are very seriously |
|---|
| 0:02:10 | so |
|---|
| 0:02:11 | similar kind of observations regarding both my differences people associated with it is in is |
|---|
| 0:02:17 | greater than i can challenge |
|---|
| 0:02:19 | an external data sequences in front end of all right |
|---|
| 0:02:23 | and the case |
|---|
| 0:02:25 | for the finals |
|---|
| 0:02:27 | so can be okay |
|---|
| 0:02:29 | why sequence is different from the positive and six |
|---|
| 0:02:35 | no less |
|---|
| 0:02:36 | but whatever this is how this is so |
|---|
| 0:02:38 | is we know is finally this can utilise in the spectrum for example in high |
|---|
| 0:02:43 | hiding behind |
|---|
| 0:02:45 | the mailman or indian or whatever |
|---|
| 0:02:47 | so the use of investments analysis that would be information across different manner |
|---|
| 0:02:53 | and i will be localised information |
|---|
| 0:02:56 | and there is no degraded performance |
|---|
| 0:02:59 | so we can see |
|---|
| 0:03:01 | more reliable detection with the features that precise information is available |
|---|
| 0:03:09 | no discuss the differences they have anything can be so as you know in the |
|---|
| 0:03:15 | mean and they were available in the early nativized the and then window is for |
|---|
| 0:03:20 | this gaussian means |
|---|
| 0:03:21 | they're exactly once |
|---|
| 0:03:24 | and temporal resolution |
|---|
| 0:03:26 | and again singing |
|---|
| 0:03:27 | the inference is quite well |
|---|
| 0:03:30 | in contrast a security which we develop a spectral and temporal resolution |
|---|
| 0:03:37 | so i think in seen once and you press continues better really know what that |
|---|
| 0:03:43 | was then |
|---|
| 0:03:44 | the |
|---|
| 0:03:45 | high resolution in the lower frequency and the higher than within the temporal resolution with |
|---|
| 0:03:54 | the |
|---|
| 0:03:54 | that means |
|---|
| 0:03:56 | the synchrony with late fusion one solution more realistic than fifty |
|---|
| 0:04:04 | no |
|---|
| 0:04:05 | in this i in this line we will it's pretty |
|---|
| 0:04:09 | considering use the solution using the cost of possible |
|---|
| 0:04:12 | so |
|---|
| 0:04:13 | given a within the by imposing is as a feature extraction the we live the |
|---|
| 0:04:19 | constraint on the human audio file |
|---|
| 0:04:22 | to illustrate the audio file is basically is you in one you for the spectral |
|---|
| 0:04:28 | resolution of this paper |
|---|
| 0:04:30 | so do not need to adapt and in the frequency domain |
|---|
| 0:04:34 | we live in form with something clusters or endorsement power spectral density |
|---|
| 0:04:39 | which can be no performance is good of you can be |
|---|
| 0:04:43 | to it you know what is good |
|---|
| 0:04:46 | like giving infinitely information across the voice vector |
|---|
| 0:04:50 | and finally we will explain the cepstral recursion |
|---|
| 0:04:54 | we apply the discrete cosine non-uniform sampling |
|---|
| 0:04:58 | this is what is it was to use cepstral coefficient feature |
|---|
| 0:05:06 | no i don't want it is mainly focus on those of police is the result |
|---|
| 0:05:10 | is visible nineteen change |
|---|
| 0:05:12 | and we use the standard problem or |
|---|
| 0:05:15 | for a policeman he was relatively is applied mimicry really implement |
|---|
| 0:05:22 | the difference of automation |
|---|
| 0:05:25 | and |
|---|
| 0:05:25 | in the following experiment |
|---|
| 0:05:27 | we used |
|---|
| 0:05:27 | standard is reasonable to the nineteen baseline system |
|---|
| 0:05:31 | so this is a gmm based system and b l is a gmm based system |
|---|
| 0:05:36 | so |
|---|
| 0:05:37 | for one point is exactly the database and the baseline system description you can therefore |
|---|
| 0:05:42 | before and references |
|---|
| 0:05:47 | no there is no knowledge of the baseline results on is feasible doesn't i database |
|---|
| 0:05:54 | is the most substantial variation in the performance of is baseline system |
|---|
| 0:06:00 | yes we can see in the human eye as an additional the for so no |
|---|
| 0:06:05 | they for example is the same thing is sixteen and eighty nine |
|---|
| 0:06:09 | where |
|---|
| 0:06:10 | this is a gmm based system |
|---|
| 0:06:12 | give them better performance and bubble |
|---|
| 0:06:15 | where there is a gmm based system |
|---|
| 0:06:18 | where s |
|---|
| 0:06:19 | for either incorrectly for in estimating the l s is a gmm based system used |
|---|
| 0:06:25 | to better performance |
|---|
| 0:06:27 | so while it is in difference in performance |
|---|
| 0:06:30 | using more differently |
|---|
| 0:06:32 | because |
|---|
| 0:06:33 | the difference in this paper or solution |
|---|
| 0:06:37 | insecurity which might suggest that be i think that it is to use this one |
|---|
| 0:06:43 | hundred |
|---|
| 0:06:44 | my representing the specifics right and then people |
|---|
| 0:06:49 | so that a nine |
|---|
| 0:06:50 | where the difference it is something you would basically the difference in the performance using |
|---|
| 0:06:55 | c and the mfcc representation |
|---|
| 0:06:59 | so we use so that analysis |
|---|
| 0:07:03 | so in this little someone analysis we propose in will be emailing representation then nutritional |
|---|
| 0:07:10 | i five tokenizer present in the spectral this domain representation you realise |
|---|
| 0:07:17 | e |
|---|
| 0:07:17 | what i think it'd implement the information they represent different are scored |
|---|
| 0:07:23 | so in this time |
|---|
| 0:07:24 | the thing i don't be many presentation of a specific is something i |
|---|
| 0:07:29 | you |
|---|
| 0:07:30 | okay you didn't seem to me |
|---|
| 0:07:33 | genetics within got frequency mean and the lexus there was because can see it makes |
|---|
| 0:07:41 | and in the and the leftmost autonomy human that it was in the market is |
|---|
| 0:07:46 | a localising the low frequency of this is what was in there was a localiser |
|---|
| 0:07:53 | five was spectrum and |
|---|
| 0:07:55 | that i in the weighted within the i-vectors are presented for the signal |
|---|
| 0:08:00 | and in my eyes in the email that imposing that are compared to a single |
|---|
| 0:08:05 | band-pass filter |
|---|
| 0:08:07 | so |
|---|
| 0:08:08 | for some time analysis |
|---|
| 0:08:09 | the remaining where can you denies gaussian with the ones |
|---|
| 0:08:15 | i by integrating |
|---|
| 0:08:18 | existing using the specific content of imposing |
|---|
| 0:08:22 | so |
|---|
| 0:08:24 | definitely representation signifying the performance of a different is performed combination in the damsel lately |
|---|
| 0:08:35 | no |
|---|
| 0:08:37 | something so i in this line mean within a do you might representation all |
|---|
| 0:08:43 | of six different specific is performed okay well i roll single within the represent the |
|---|
| 0:08:51 | representational |
|---|
| 0:08:52 | and using the secrecy sequences in gmm based system |
|---|
| 0:08:57 | leaving i think is a c and d processing be a more general representing |
|---|
| 0:09:03 | he may representation using the ellipses in gmms listed in gmm based system |
|---|
| 0:09:09 | so you can see that |
|---|
| 0:09:11 | you for specifically for example is the same thing is sixteen in may nineteen |
|---|
| 0:09:17 | where |
|---|
| 0:09:18 | this is in gmm based system |
|---|
| 0:09:19 | the estimated performance and then |
|---|
| 0:09:22 | well i yes for identity in country and fourteen nist nineteen |
|---|
| 0:09:27 | when extending this is not use the better performance and on the gmm based |
|---|
| 0:09:34 | so probably you when he was addition we can see that for those or three |
|---|
| 0:09:39 | is the main sixty and may nineteen |
|---|
| 0:09:42 | i think that legalising d i i think of this better where is the thing |
|---|
| 0:09:49 | "'cause" it is important to the data |
|---|
| 0:09:52 | but details are really and you the better performance |
|---|
| 0:09:55 | where is it is still |
|---|
| 0:09:57 | it could mean and importantly where i think so localising the |
|---|
| 0:10:02 | i don't with the presidential elections seems to have that the and you the better |
|---|
| 0:10:08 | performance |
|---|
| 0:10:10 | no i guess of is defined in a day where is the performance work because |
|---|
| 0:10:15 | the i for initial immunity the i-vectors are not explicitly localised spectrum so for example |
|---|
| 0:10:22 | in the need a sequence is he or no ellipses in front end |
|---|
| 0:10:31 | no that don't temporal resolution and maybe of wasn't feasible fishing |
|---|
| 0:10:38 | so in this light we will explain why i think is the same front-end format |
|---|
| 0:10:43 | for |
|---|
| 0:10:44 | so i x |
|---|
| 0:10:49 | so |
|---|
| 0:10:49 | in this data is shown on the classical be split off and highly speech frame |
|---|
| 0:10:54 | which represent the new nine |
|---|
| 0:10:56 | and use it in this city |
|---|
| 0:10:58 | taking this is obviously that was a good lately |
|---|
| 0:11:01 | please remember that the other thing in a possible solution is represented by the area |
|---|
| 0:11:08 | defined by |
|---|
| 0:11:09 | what we are looking like |
|---|
| 0:11:12 | so |
|---|
| 0:11:12 | probably one finger against and now we can see that |
|---|
| 0:11:15 | if they are compressed using d i i was in part of the spectral then |
|---|
| 0:11:22 | how this particular |
|---|
| 0:11:24 | it means that only invading is also this area is contaminated |
|---|
| 0:11:30 | two additional cepstral coefficient |
|---|
| 0:11:33 | that is only bring reading the |
|---|
| 0:11:36 | and then it is okay in these diversity in the women |
|---|
| 0:11:41 | s a single in the windows |
|---|
| 0:11:44 | we aim to the |
|---|
| 0:11:45 | investigating more contribution to the computational distribution which means |
|---|
| 0:11:50 | they're eating i is to deal with one second only |
|---|
| 0:11:54 | and you have one single |
|---|
| 0:11:57 | no |
|---|
| 0:11:58 | this control which is if it is forcing frame when using the uniform recently all |
|---|
| 0:12:03 | be uniform resembling ones not seem to be |
|---|
| 0:12:05 | it is normally using the |
|---|
| 0:12:07 | sequences of feature extraction |
|---|
| 0:12:09 | so no hannah |
|---|
| 0:12:11 | well |
|---|
| 0:12:12 | we don't know why the within the how |
|---|
| 0:12:16 | exactly |
|---|
| 0:12:17 | and unionise |
|---|
| 0:12:19 | needed something in the frequency domain |
|---|
| 0:12:21 | so in this in this problem can see that |
|---|
| 0:12:25 | e |
|---|
| 0:12:27 | it was it is before |
|---|
| 0:12:28 | no there is no what you contribution we got stuck a traditional cepstral coefficient daisy |
|---|
| 0:12:35 | higher |
|---|
| 0:12:36 | usually motivation the cepstral a |
|---|
| 0:12:39 | a computational cepstral coefficient which means |
|---|
| 0:12:42 | i don't information in government and giving more if the size of one second |
|---|
| 0:12:48 | is known to be consistently for women is different for the first low frequency scale |
|---|
| 0:12:54 | is with me |
|---|
| 0:12:55 | and for the signal treatment is gonna union |
|---|
| 0:12:59 | and lastly we show that |
|---|
| 0:13:01 | us spend a of them you need i spent on the |
|---|
| 0:13:05 | they |
|---|
| 0:13:06 | this shows because motion is sixteen cepstral coefficient is uniform |
|---|
| 0:13:11 | i don't is better which means |
|---|
| 0:13:13 | when i first was in any way to spend on |
|---|
| 0:13:16 | then it would be better to use i |
|---|
| 0:13:19 | localised there are a total successfully |
|---|
| 0:13:24 | and |
|---|
| 0:13:25 | that you can use the cost in order to a constant is a solution was |
|---|
| 0:13:30 | different spectral |
|---|
| 0:13:31 | no only one thing that is k |
|---|
| 0:13:34 | when i based on the polite the women |
|---|
| 0:13:37 | then he was also can be |
|---|
| 0:13:39 | using the challenge is good |
|---|
| 0:13:41 | given the one of the size of the lower bound and able to capture the |
|---|
| 0:13:45 | a difference when the other realising over right |
|---|
| 0:13:47 | where s |
|---|
| 0:13:48 | when i based on lies in the way the use of sequences using that is |
|---|
| 0:13:52 | thinking |
|---|
| 0:13:53 | the engine |
|---|
| 0:13:54 | the okay to those are persuaded and you the better performance and when i was |
|---|
| 0:13:59 | i wasn't anything the spectrum that and it is different then |
|---|
| 0:14:04 | it will look at those i think |
|---|
| 0:14:06 | then and you get better performance |
|---|
| 0:14:08 | then the secrecy using a recently |
|---|
| 0:14:17 | now |
|---|
| 0:14:19 | it's just an no mandatory have in the i-vector nonetheless global warning behind |
|---|
| 0:14:26 | use of sequences the using the dramatically scale and he news good if the performance |
|---|
| 0:14:32 | based on maybe i'll fix the log spectrum |
|---|
| 0:14:35 | so in this thing here in this role |
|---|
| 0:14:39 | all singing voices to represent the |
|---|
| 0:14:41 | there will be made within twenty minutes ambition using the or something custody |
|---|
| 0:14:46 | that means because he's using that exactly |
|---|
| 0:14:49 | where is it wouldn't the closing we didn't show that it was in there |
|---|
| 0:14:55 | they do not even representation using the gmm based system |
|---|
| 0:14:59 | where |
|---|
| 0:15:00 | beginning |
|---|
| 0:15:01 | that's true within their |
|---|
| 0:15:03 | the only male presentation using the efficiency with the german task which is a anything |
|---|
| 0:15:11 | so we you remain |
|---|
| 0:15:13 | s goes we are using the original signal being systems are statistically or we will |
|---|
| 0:15:19 | go in it was it would be nice to |
|---|
| 0:15:22 | no |
|---|
| 0:15:23 | again in there |
|---|
| 0:15:24 | if we can say they are specific what they contain in and fourteen |
|---|
| 0:15:28 | where we can see that you are used a specially localising be |
|---|
| 0:15:32 | logan |
|---|
| 0:15:33 | and now you know there |
|---|
| 0:15:36 | in our previous presenting overdemand use of sequences the user dramatically scale |
|---|
| 0:15:42 | hindi better performance and table two shows are explained that she |
|---|
| 0:15:47 | think this is in here |
|---|
| 0:15:49 | this is because as in business |
|---|
| 0:15:51 | it's a question you know right |
|---|
| 0:15:57 | no only in thinking one representation |
|---|
| 0:16:00 | this can afford it is from the decision a fourteen thirteen and fourteen and be |
|---|
| 0:16:05 | used to think is residual material based front end |
|---|
| 0:16:09 | a the idea that multiplying being by giving substantially nobody they were using it |
|---|
| 0:16:18 | it is really is |
|---|
| 0:16:22 | so no i |
|---|
| 0:16:24 | i that imposing a when the i-vectors analysing the woodbury then you also decreases the |
|---|
| 0:16:29 | original article is good this one day they're having the size of those of us |
|---|
| 0:16:36 | and those are frequently but the woman |
|---|
| 0:16:42 | no |
|---|
| 0:16:42 | the conditional condition |
|---|
| 0:16:45 | so |
|---|
| 0:16:46 | if you already |
|---|
| 0:16:47 | seen a linguistically and presentation you might hear i you might the idea would be |
|---|
| 0:16:53 | presentation |
|---|
| 0:16:54 | originally proposed in this |
|---|
| 0:16:56 | well |
|---|
| 0:16:57 | for something analysis to identify localiser representing this problem |
|---|
| 0:17:03 | we define the also find that the different exactly the i think within the different |
|---|
| 0:17:09 | something and |
|---|
| 0:17:11 | but it was activated in front end which imprecise information relayed consuming |
|---|
| 0:17:18 | and |
|---|
| 0:17:19 | it was also they're using the front end and vocal qualities of the database |
|---|
| 0:17:25 | so this finding explain why |
|---|
| 0:17:29 | that is simply a back to estimate the solution is |
|---|
| 0:17:33 | so what i in this thing |
|---|
| 0:17:39 | bengio |
|---|
| 0:17:40 | and if you have any portion a have little as follows |
|---|