0:00:16 | i management and they're representing work from sri international on the adapted mean normalization for |
---|
0:00:23 | unsupervised adaptation of speaker and bindings |
---|
0:00:28 | the scroll control problem statement if on are actually trying to tackle deal with this |
---|
0:00:33 | work |
---|
0:00:33 | well then look at the wrong name normalization have it applies to the state-of-the-art speaker |
---|
0:00:38 | recognition systems |
---|
0:00:40 | and we're going for the proposed technique adaptive may normalization |
---|
0:00:43 | and have a look at six times to see how phones |
---|
0:00:48 | the problem statement |
---|
0:00:50 | variability is well-known one of the biggest challenges to practical use of speech face recognition |
---|
0:00:55 | systems |
---|
0:00:56 | and mister really refers to changes in a fixed between the training |
---|
0:01:00 | and successive detection attempts |
---|
0:01:02 | a system |
---|
0:01:04 | everything calmed and two types of variability |
---|
0:01:06 | one is extremes that is something separated from this data |
---|
0:01:10 | includes things like microphones the acoustic environment transmission channel |
---|
0:01:14 | you turn it is intrinsic are really this is to do with this data |
---|
0:01:18 | ally vary over time and things into two different variability here |
---|
0:01:23 | include health |
---|
0:01:24 | stress |
---|
0:01:25 | they're all the stake |
---|
0:01:26 | speaking style |
---|
0:01:27 | these differences are collectively referred to as the main mismatch |
---|
0:01:31 | when you looking at the differences between system training sonar |
---|
0:01:36 | and detection attempts |
---|
0:01:40 | now many of us know the domain mismatch typically results in performance of the system |
---|
0:01:44 | now this is a performance with respect to the extent the performance of the system |
---|
0:01:49 | once the systems trained on land we have a certain estimate of have standard for |
---|
0:01:54 | if that the main when there is that the what changes |
---|
0:01:57 | then we have almost all |
---|
0:01:59 | due to this domain mismatch |
---|
0:02:01 | now we use a just two different things |
---|
0:02:04 | one is discrimination loss |
---|
0:02:06 | that means less karen the system to separate state |
---|
0:02:09 | the others mis calibration when assistance miss calibrated |
---|
0:02:12 | and it gives a score |
---|
0:02:14 | that's for a my actually recently the use of into believing something that shouldn't have |
---|
0:02:19 | been detected prints |
---|
0:02:22 | domain adaptation be used to cope with this problem |
---|
0:02:24 | and it's two different ways of dealing with domain adaptation one is suitable |
---|
0:02:28 | this is where we have labeled data |
---|
0:02:31 | where we often get improvement or reliable improvement |
---|
0:02:35 | but it is a high costs will be a man data that you end up |
---|
0:02:38 | eating to improve system |
---|
0:02:40 | the alternative is unsupervised adaptation |
---|
0:02:43 | it has a very low cost |
---|
0:02:45 | there's actually no and use a labelling and all |
---|
0:02:48 | plenty of data available |
---|
0:02:49 | and it's ideally matched to leave out conditions |
---|
0:02:52 | but insight here is that we have no ground truth labels to reliable |
---|
0:02:59 | for using this work |
---|
0:03:01 | is the unsupervised adaptation snore |
---|
0:03:06 | there are some shortcomings of unsupervised adaptation |
---|
0:03:09 | one is a lack of generalisation |
---|
0:03:11 | now quite high decisions a to be my in a have to comply a supervised |
---|
0:03:17 | our approach for instance if we're going to retrain |
---|
0:03:20 | lda okay ogi best system |
---|
0:03:22 | we end up needing to make some kind of assumptions about |
---|
0:03:25 | which clusters different audio segments mark going to with respect to different stages |
---|
0:03:30 | it can also be over g for the data being trained with |
---|
0:03:35 | trustworthiness is nullified the |
---|
0:03:37 | when we get guarantees for improvements friends it was a patient their limited i see |
---|
0:03:42 | that |
---|
0:03:42 | then this complexity |
---|
0:03:44 | some approaches have high computation and that makes it a little bit more difficult to |
---|
0:03:48 | give two |
---|
0:03:49 | clients or uses a goes out the door |
---|
0:03:54 | and the question we trained and two years or where is the best place to |
---|
0:03:57 | apply adaptation |
---|
0:03:59 | and i in the unsupervised scenario |
---|
0:04:01 | where can be fast and reliable once deployed |
---|
0:04:04 | so on screen here we have diagram all the different stages of a speaker recognition |
---|
0:04:09 | one |
---|
0:04:10 | and we can look at what would happen if we applied adaptations which of these |
---|
0:04:14 | stages |
---|
0:04:15 | i think the feature extraction the mfcc is or how normalized cepstral coefficients |
---|
0:04:22 | speaker embedded in it |
---|
0:04:23 | if someone was tuned that |
---|
0:04:25 | and hot in this in our here |
---|
0:04:27 | attacks requires a for re-training or but the nn and the back end modules |
---|
0:04:31 | you need to have a lot of data on hand ready to do that process |
---|
0:04:34 | that's how to explore |
---|
0:04:38 | what about speech activity detection |
---|
0:04:40 | now there are approaches by next the genus wasn't stages |
---|
0:04:44 | a different scenarios |
---|
0:04:46 | this is useful when that is actually the main since the |
---|
0:04:49 | but it's on purchase solution doesn't really help the discrimination a in the rest of |
---|
0:04:54 | the form |
---|
0:04:57 | lda purely eye calibration |
---|
0:05:00 | no these the sum of the clock kinds of the backend process |
---|
0:05:03 | but center require labels or prediction in your clustering and this can be are carried |
---|
0:05:08 | by a projectionist |
---|
0:05:11 | like normalization well there's no actual adaptation to go on us does not applicable |
---|
0:05:16 | which leads us would mean normalization |
---|
0:05:18 | now this is simple to that a parameter sorry in general typically about two hundred |
---|
0:05:23 | numbers of the only i |
---|
0:05:25 | at the request use doesn't help |
---|
0:05:28 | but sort of the role of mean normalization in a system |
---|
0:05:32 | nobody only i is a strong model when the assumptions of a few for the |
---|
0:05:36 | p lda model car |
---|
0:05:37 | now that is the distribution of the data going into it |
---|
0:05:40 | fixed a standard normal distribution |
---|
0:05:43 | in training |
---|
0:05:44 | without trying to is that |
---|
0:05:46 | mean normalization |
---|
0:05:47 | and length normalization together achieve this |
---|
0:05:50 | so the assumptions of for your |
---|
0:05:52 | for a few when the system is ranked right |
---|
0:05:55 | length normalization when we will actually projects embedding some to you know how does the |
---|
0:06:00 | and that's a trigram actually right here that demonstrates just |
---|
0:06:03 | but evenly spread around the house |
---|
0:06:06 | and this issues a zero-mean works well during training |
---|
0:06:11 | emphasis shifted domain |
---|
0:06:12 | wow of course of training |
---|
0:06:15 | such as with evaluation data |
---|
0:06:18 | so this is in this diagram here |
---|
0:06:20 | and then some producing you know how to say that has a distribution that is |
---|
0:06:24 | not evenly distributed |
---|
0:06:26 | therefore assumptions appeared in model i'm not sure field anymore |
---|
0:06:30 | and we actually reduce the discrimination battle |
---|
0:06:36 | no so that actual performance here when we look at this difference all using a |
---|
0:06:41 | system based main |
---|
0:06:42 | where we have taken the mean from the actual training data a best this the |
---|
0:06:46 | impact if we actually i mean just the mean of the system |
---|
0:06:51 | two i held out dataset |
---|
0:06:53 | only relevant conditions of the data with benchmark |
---|
0:06:57 | now there are more details on the evaluation protocol on the dataset used here later |
---|
0:07:01 | on in the presentation |
---|
0:07:02 | but for now this is a quick snapshot of what happens if you simply update |
---|
0:07:06 | the main the main of the only a tighter polish dataset |
---|
0:07:10 | actually see the equal error rate increase prior to nineteen percent |
---|
0:07:14 | really helping and discrimination that |
---|
0:07:16 | but even more impressive |
---|
0:07:17 | is the fact that sail that's the cost of the likelihood ratio |
---|
0:07:21 | and that's an indication of discrimination and calibration performance |
---|
0:07:25 | improves file to sixty percent |
---|
0:07:27 | no this is this five holding the calibration model as used in mismatched to the |
---|
0:07:33 | other conditions |
---|
0:07:34 | in particular this calibration model here is train the red source data |
---|
0:07:38 | that's clean telephone data |
---|
0:07:40 | and yet in the sre dataset and stickers in well |
---|
0:07:44 | it is dramatically helping calibration |
---|
0:07:46 | so having a roommate really is crucial |
---|
0:07:52 | no it's okay a that it may normalization |
---|
0:07:55 | so if you we've got |
---|
0:07:58 | i mean that a suitable when evaluation conditions are homogeneous so if we deploy a |
---|
0:08:03 | system we know the generally what we do about four it's not gonna very much |
---|
0:08:07 | from that |
---|
0:08:08 | that is that the okay |
---|
0:08:10 | the problem comes in my conditions can vary over time will between trial |
---|
0:08:15 | so for instance dealing with radio broadcast right |
---|
0:08:18 | over time depending on the single signal the time of day or maybe a system |
---|
0:08:23 | thing used for both telephone and microphone style |
---|
0:08:26 | calls |
---|
0:08:27 | then we end up having this distribution over here only bottomright where we have different |
---|
0:08:32 | needs |
---|
0:08:33 | and how that projects onto the in opposite |
---|
0:08:36 | this means that a ideally what would love to be able to here is actually |
---|
0:08:41 | and that domain |
---|
0:08:42 | depending on the conditions of the trial at hand |
---|
0:08:45 | so that means we wanna dynamically defining |
---|
0:08:48 | as we're going to resist |
---|
0:08:51 | that's what we contract requires method of adaptive name normalization |
---|
0:08:57 | so what is a |
---|
0:08:58 | well this process actually stand that if that's on trial based calibration |
---|
0:09:02 | and what role based calibration does is actually whites into a problem can to define |
---|
0:09:06 | the system parameters |
---|
0:09:08 | in particular the calibration model parameters |
---|
0:09:11 | it actually looks conditions of hand |
---|
0:09:14 | of the trial coming in both sides the enrollment and test on |
---|
0:09:18 | defines different subsets for those |
---|
0:09:22 | conditions |
---|
0:09:22 | and then finds calibration model on-the-fly using held out data |
---|
0:09:28 | so called the system here |
---|
0:09:30 | is to try my the system model what is general |
---|
0:09:33 | and reliable as possible |
---|
0:09:37 | no one extra advantage here's the overtime as systems the system is saying more and |
---|
0:09:41 | more conditions are more relevant data |
---|
0:09:44 | it can act agreement about the new conditions of the time |
---|
0:09:48 | the he's the process |
---|
0:09:50 | so taken a bit of is not show on the embedding is only a mean |
---|
0:09:54 | normalization links not purely and calibration left hand side |
---|
0:09:58 | and was fifteen |
---|
0:10:00 | the a normalization and the adaptive process |
---|
0:10:04 | one used to be there was inconsistent me |
---|
0:10:07 | no we're doing is taking the goodies from after all |
---|
0:10:10 | so in fact this can be this is an embedding |
---|
0:10:14 | specific process |
---|
0:10:15 | not a troll specific process which is a bit of the benefit here |
---|
0:10:18 | and terms of computation |
---|
0:10:20 | for each embedding what we do is we go meet her some personal |
---|
0:10:24 | against a bunch of handed in |
---|
0:10:26 | embedding |
---|
0:10:27 | what we wanna do is found those embedding from the candidate on that are similar |
---|
0:10:31 | conditioned to out embedding that's coming into the adaptively may normalized |
---|
0:10:36 | we make a selection of that's also |
---|
0:10:39 | we then find the condition name based on that sounds that |
---|
0:10:42 | no and how many strong and we find and how many we would like to |
---|
0:10:46 | find we dental weighting process |
---|
0:10:50 | and then we use that men |
---|
0:10:52 | as noted that the main one for that embedding |
---|
0:10:55 | with and follow on through the rest of the pipeline |
---|
0:10:59 | so what we're trying to do here is |
---|
0:11:02 | make this happen on the fly in fact that actually has very little overhead |
---|
0:11:08 | there are some ingredients that we need for that it may normalization |
---|
0:11:12 | now in terms of making a comparison between embedding and handed it we need something |
---|
0:11:17 | that can |
---|
0:11:18 | tell us whether the conditions from those embedding the similar or not |
---|
0:11:22 | but this we use condition field |
---|
0:11:24 | and this is really what we only a what we use the speaker recognition |
---|
0:11:28 | ever instead of discriminating stated |
---|
0:11:30 | it's trained to discriminate different conditions |
---|
0:11:33 | but conditions include compression time |
---|
0:11:35 | re the five noise type |
---|
0:11:37 | language in general |
---|
0:11:39 | when we combine those things together we actually end up with our eleven thousand unique |
---|
0:11:43 | conditions |
---|
0:11:45 | so it should be a very thin slice |
---|
0:11:47 | that we're dealing with |
---|
0:11:49 | so for meaningful candidate embedding |
---|
0:11:52 | and this is just a love mixture conditions anything controller really |
---|
0:11:56 | and ideally it's including some examples and evaluation conditions |
---|
0:12:00 | now if that's not the case again what the system could actually be a |
---|
0:12:05 | after is deployed is actually have testing data along the way to calculate the whole |
---|
0:12:10 | handed in embedding |
---|
0:12:11 | to be more suited to the conditions |
---|
0:12:15 | and this whole is used to dynamically estimate that means all conditions |
---|
0:12:19 | finally there are twelve parameters one is the condition similarity threshold we don't want everything |
---|
0:12:24 | from the candidate for coming through we wanna say we want to determine how similarity |
---|
0:12:29 | is and make sure similar enough |
---|
0:12:32 | the pastor in the next stage remain constant |
---|
0:12:35 | the thing that we wanna sit is the maximum number of candidates to select |
---|
0:12:39 | now everything if everything in that and are the cool |
---|
0:12:42 | was about the threshold |
---|
0:12:45 | everything will be faster to maybe we get a no benefit to the don't have |
---|
0:12:49 | a natural system |
---|
0:12:51 | we wanna make sure that we mean that how much longer term just select the |
---|
0:12:54 | top number of this |
---|
0:12:55 | so if we then go back to our picture here we can fill in a |
---|
0:12:58 | few different things |
---|
0:13:00 | for instance the comparison now is done that security i |
---|
0:13:04 | we didn't do the selection process where n is the number of candidates for the |
---|
0:13:08 | similarity |
---|
0:13:09 | about the official |
---|
0:13:11 | however if an |
---|
0:13:12 | is more than a maximum where layer |
---|
0:13:14 | which is an |
---|
0:13:16 | we kind of the and with the highest similarity |
---|
0:13:19 | but when making sure we can be most relevant ones for our main estimate |
---|
0:13:24 | once we estimate domain |
---|
0:13:26 | we go on to a weighted average |
---|
0:13:28 | with a system |
---|
0:13:30 | and that weighted average |
---|
0:13:31 | means |
---|
0:13:34 | them close we get to that type of value and the more we rely on |
---|
0:13:38 | being you gonna make me |
---|
0:13:40 | whereas we do in the following that the system in the case that no relevant |
---|
0:13:44 | samples could be |
---|
0:13:48 | and several in a few of "'em" benefits of adaptive may normalization |
---|
0:13:52 | no harm said that it's very minimal overhead |
---|
0:13:55 | and that over here is that defined by the number handed examples that has to |
---|
0:13:59 | compare it is |
---|
0:14:01 | this is also applied for embedding instead of a problem which usually done in from |
---|
0:14:05 | based calibration |
---|
0:14:06 | and the sense to doing a lot in terms of reducing computation |
---|
0:14:12 | it can for the case of no relevant examples where the reverse is just the |
---|
0:14:16 | main based on the |
---|
0:14:18 | weighted average |
---|
0:14:20 | you know enrollment audio or test audio could actually collected to re |
---|
0:14:24 | time in the candidate pool |
---|
0:14:26 | and this allows that are most relevant changes are that i'm aware of the systems |
---|
0:14:30 | being coloured |
---|
0:14:33 | now the simple process |
---|
0:14:35 | with the parameters thing under two hundred numbers this changing he |
---|
0:14:39 | it's also weighting against this is the main so makes it a little bit difficult |
---|
0:14:42 | over fitting which is room benefit |
---|
0:14:45 | and finally |
---|
0:14:46 | we show a we find quite impressive here is that it allows a single |
---|
0:14:50 | static calibration model to be applied across in this now that's exactly the problem that |
---|
0:14:55 | row based calibration was trying to sell |
---|
0:14:58 | by adapting the calibration model |
---|
0:15:00 | no gonna step further that and the one of the system to that main |
---|
0:15:05 | which of as a calibration models just a static |
---|
0:15:08 | and the cost |
---|
0:15:09 | the other night sure main normalization there |
---|
0:15:12 | a last few only ice assumptions to be for field |
---|
0:15:15 | we calibration model after peoria scoring |
---|
0:15:18 | is also suitable |
---|
0:15:21 | let's take a look of experiments |
---|
0:15:24 | first of all the baseline system we do and t is the sri |
---|
0:15:28 | can you s e |
---|
0:15:30 | came submission for the sre i mean |
---|
0:15:34 | nist evaluation this involves sixteen khz |
---|
0:15:37 | how normalized cepstral coefficients |
---|
0:15:39 | and multi band i think that things |
---|
0:15:42 | a multiband which means that we trained the embedding system with by i k and |
---|
0:15:46 | sixteen k data at any time we had sixteen k to i |
---|
0:15:49 | we also downsampled to eight as well |
---|
0:15:52 | so that the nn was exposed both i k and six think a directional sign |
---|
0:15:57 | audio segments |
---|
0:15:58 | that tended to help bridge the gap between i khz and sixteen khz evaluation data |
---|
0:16:06 | we trained on the standard datasets |
---|
0:16:08 | the references for those datasets are and i |
---|
0:16:10 | and we do the standard ornamentation occurs |
---|
0:16:13 | now this mentioned before the calibration model years trained on the right source data |
---|
0:16:18 | now that is from the rats program darpa rats program |
---|
0:16:22 | but is the telephone dinally clean data not the transmission data which is heavily degraded |
---|
0:16:28 | in terms of evaluation |
---|
0:16:30 | we split out of l six and two evaluation and norm |
---|
0:16:35 | now from the nist sre corpora |
---|
0:16:37 | it doesn't sixteen two thousand nineteen |
---|
0:16:39 | but have their own name sets |
---|
0:16:41 | available with an known as the unlabeled data as you can see in table or |
---|
0:16:47 | speakers in while |
---|
0:16:48 | we use the about fourteen |
---|
0:16:49 | or evaluation and the dataset |
---|
0:16:51 | but the notes the |
---|
0:16:53 | again speakers for this are disjoint |
---|
0:16:55 | this is right source data we axis that this plastic to have two different speaker |
---|
0:17:00 | calls |
---|
0:17:01 | ones for evaluation and ones for the normalization step |
---|
0:17:07 | in terms of the adapted mean normalization parameters |
---|
0:17:10 | we setting condition similarity threshold of ten |
---|
0:17:14 | and the maximum number of candidates thing half number candidate samples |
---|
0:17:18 | for the dataset |
---|
0:17:20 | of candidates |
---|
0:17:21 | and these by searching on the rats or style |
---|
0:17:24 | so you can see how many segments were available for each an onset |
---|
0:17:29 | including the pools |
---|
0:17:31 | which we use initially |
---|
0:17:33 | and the n b value all can't is that with trying to g |
---|
0:17:37 | and that value of and remember |
---|
0:17:39 | also helps with the |
---|
0:17:40 | weighted average |
---|
0:17:42 | it when the dynamic maintenance this tonight so close we can get that |
---|
0:17:46 | a more relies on the and it may |
---|
0:17:54 | let's look at the out of the box performance |
---|
0:17:57 | so we've got here of for different datasets sre sixteen i can stick as in |
---|
0:18:01 | the wall and the right clean telephone data |
---|
0:18:05 | the baseline system here we consider them a norm is |
---|
0:18:08 | simply the main that was estimated during the training of the system |
---|
0:18:13 | and the calibration model is right |
---|
0:18:15 | now what happens |
---|
0:18:17 | if we go and look at a |
---|
0:18:19 | adapting calibration model |
---|
0:18:22 | the actual eval set |
---|
0:18:23 | so this is a cheating experiment on the right hand side |
---|
0:18:26 | essentially what we're doing is we're replacing the rest calibration model |
---|
0:18:31 | we've the eval set calibration model |
---|
0:18:36 | what we can see here |
---|
0:18:38 | is that we're getting much better calibration performance |
---|
0:18:41 | "'cause" some datasets that acoustic isn't model |
---|
0:18:44 | rest doesn't seem too much |
---|
0:18:46 | and |
---|
0:18:48 | the equal error rates they tend to vary wildly between these different datasets |
---|
0:18:53 | but the calibration is considerably better compared to those |
---|
0:18:56 | but see lower value of one for non |
---|
0:18:58 | one four seven |
---|
0:19:00 | because it is matched to the rats data used in calibration model |
---|
0:19:09 | let's look at the impact of relevant may normalization |
---|
0:19:13 | now previously really on in this presentation we show the first two columns he the |
---|
0:19:17 | baseline and the condition based mean normalization |
---|
0:19:20 | now reading on the call |
---|
0:19:22 | assume that it may normalization |
---|
0:19:26 | what we fancy a using adapted mean normalization the whole |
---|
0:19:30 | on the held-out data says training together the sre six thing i think |
---|
0:19:34 | speaks in water and rats data |
---|
0:19:36 | will be held out dataset as one day call handed |
---|
0:19:41 | and that you mean normalization was able to outperform the conditions specific mean normalization in |
---|
0:19:46 | the heterogeneous conditions |
---|
0:19:48 | so in particular the honesty is in well |
---|
0:19:52 | and the sre dataset |
---|
0:19:54 | the calibration performance there is in proving quite significantly and sometimes in some cases |
---|
0:20:00 | other times i two thousand i think it's a nice improvement |
---|
0:20:05 | the where i've to seven sixteen also improves |
---|
0:20:09 | quite reasonably |
---|
0:20:11 | no what's interesting is the adaptive process didn't really have a direct condition |
---|
0:20:16 | so that of a benefit there's well |
---|
0:20:20 | but now you data requirements |
---|
0:20:23 | how much data do we actually need in canada segment |
---|
0:20:26 | productive may normalization work |
---|
0:20:29 | well we've done on the slide here is we're looking at these still |
---|
0:20:33 | the which remember a |
---|
0:20:37 | how's measure the discrimination performance and calibration problem |
---|
0:20:41 | the dashed fines of the baseline performance across the for different datasets |
---|
0:20:45 | the more solid lines a what happens as we were very the number of canada |
---|
0:20:49 | segments |
---|
0:20:50 | now remember these used to the at least |
---|
0:20:53 | one thousand two hundred samples |
---|
0:20:55 | i mean doing this in my dataset specific scenario |
---|
0:20:59 | where for instance |
---|
0:21:00 | sre sixteen |
---|
0:21:03 | they can pull is the actual unlabeled data from sre six thing so |
---|
0:21:07 | suitable for the conditions when we randomly selecting from that held out on that |
---|
0:21:14 | well we see here is that |
---|
0:21:16 | quite rapidly after thirty two relevant segments |
---|
0:21:20 | independent or we're already |
---|
0:21:22 | in front of the based on local |
---|
0:21:24 | so it's already sufficient for significance a lower improvement and we saw the |
---|
0:21:29 | this also have an equal error rate in terms of be trained |
---|
0:21:33 | not quite so much of a relative guy |
---|
0:21:36 | again the true relevant segments from the target to my |
---|
0:21:40 | wasn't nothing this is that process to get a good guy |
---|
0:21:47 | now importantly what happens when we have adapted mean normalization |
---|
0:21:51 | we employ |
---|
0:21:53 | and the data and it can all is mismatched |
---|
0:21:56 | two conditions that are gonna be evaluated |
---|
0:22:00 | we wanna see what happens in this case so what we did for each |
---|
0:22:03 | dataset will benchmarking here |
---|
0:22:06 | we excluded the relevant data |
---|
0:22:09 | common can hold for that is |
---|
0:22:12 | so for instance |
---|
0:22:13 | with the rats data set down the one on the table |
---|
0:22:15 | we actually excluded that from the whole standards and just retained stick as in the |
---|
0:22:20 | war |
---|
0:22:20 | and the two sre dataset in the can cool |
---|
0:22:23 | and that's all had to select from |
---|
0:22:25 | in order to estimate domain and the hyper don't in the system |
---|
0:22:29 | i remember when it can actually find anything but it is relevant |
---|
0:22:33 | if all that the system in |
---|
0:22:36 | so a wooden thing that the performance is the sign |
---|
0:22:40 | as the baseline system |
---|
0:22:42 | well better |
---|
0:22:45 | now we can see is speakers in while |
---|
0:22:48 | and rats actually perform reasonably well |
---|
0:22:51 | there was an improvement still with stages and well which is surprising |
---|
0:22:55 | with right |
---|
0:22:55 | this still a that was just a little bit |
---|
0:22:59 | sre six thing |
---|
0:23:00 | integrated what's really with respect and baseline |
---|
0:23:03 | without any relevant data for a min |
---|
0:23:07 | and we tried to vary just selection threshold he in the hope that using a |
---|
0:23:10 | higher threshold |
---|
0:23:12 | we restrict the subset selection to really |
---|
0:23:15 | the closest ones possible |
---|
0:23:17 | other this didn't have |
---|
0:23:18 | so this indicates is there was a problem with the currency purely eye are conditions |
---|
0:23:23 | the audio card and wasn't quite optimal for selection |
---|
0:23:27 | in this not mismatch scenario |
---|
0:23:32 | so in summary we propose adaptive may normalization |
---|
0:23:36 | it's simple and effective leveraging the test and adjourn used possible |
---|
0:23:41 | it's useful i just i samples the state in fact i think that should be |
---|
0:23:45 | thirty two samples of speech |
---|
0:23:48 | the discrimination we sort improvements about twenty six distend and intense calibration mentioned through the |
---|
0:23:54 | c o l |
---|
0:23:55 | with or improvements of up to sixty "'cause" |
---|
0:23:57 | sixty six percent relative over the baseline system |
---|
0:24:01 | and what's important here is that actually and i want to study calibration model to |
---|
0:24:04 | become suitable for varying conditions |
---|
0:24:07 | that's a room between once you system goes out the door |
---|
0:24:11 | in terms of future work we identified a couple things |
---|
0:24:14 | we want to enhance the selection method to be robust when relevant matters like embezzling |
---|
0:24:18 | which i do not very fast experiment |
---|
0:24:20 | we also wanna do experiments in how active learning over time |
---|
0:24:25 | can improve that calibration pool sorry not calibration pork and oracle |
---|
0:24:30 | but collecting five test data |
---|
0:24:32 | over time that's relevant to the examples and retaining i recent history |
---|
0:24:37 | five hundred representation not be happy to hear any remarks of questions from anyone |
---|
0:24:42 | thank you |
---|