| 0:00:26 | hi everyone i |
|---|
| 0:00:28 | moneymaking sponsored by |
|---|
| 0:00:29 | i come from the null suppressed and only technical university over time |
|---|
| 0:00:34 | is that we deal is a presentation on my paper |
|---|
| 0:00:38 | although word for work or |
|---|
| 0:00:40 | workshop of the odyssey two so |
|---|
| 0:00:42 | to sound and the twenty |
|---|
| 0:00:46 | now that speaking |
|---|
| 0:00:47 | the title of this paper these partial using metric learning best a speaker verification back |
|---|
| 0:00:53 | end |
|---|
| 0:00:54 | in other wars |
|---|
| 0:00:56 | this paper proposed a shallow match learning back end algorithm both speaker verification |
|---|
| 0:01:20 | okay i will present it from this for aspects |
|---|
| 0:01:24 | including as the metric learning and of the motivation |
|---|
| 0:01:28 | the proposed objective function |
|---|
| 0:01:31 | some experimental results |
|---|
| 0:01:34 | and the and last i will give some conclusions |
|---|
| 0:01:38 | and i will also introduce several all of that works |
|---|
| 0:01:41 | this paper and do |
|---|
| 0:01:43 | our future plans |
|---|
| 0:01:48 | first |
|---|
| 0:01:49 | the maxent learning and the motivation |
|---|
| 0:01:55 | and illustrated in the title i thing well i can't on their these two questions |
|---|
| 0:02:01 | the motivation of this paper we are equally |
|---|
| 0:02:06 | the first one is what at the automatic learning and what i've we proposed a |
|---|
| 0:02:12 | metric learning passed back end algorithm |
|---|
| 0:02:22 | the mac learning em's to learn distance function to matters the similarity of them both |
|---|
| 0:02:28 | third and the mahalanobis distance |
|---|
| 0:02:32 | both speaker verification as displayed in the right speaker of this slide |
|---|
| 0:02:39 | we first extract it is speaker identity features problems what i'm she's by a front |
|---|
| 0:02:45 | and the speaker feature extractor |
|---|
| 0:02:48 | but and the i-vector of the extractor |
|---|
| 0:02:52 | and the thing we feed them to the metric learning past the back end to |
|---|
| 0:02:57 | calculate the here |
|---|
| 0:02:59 | similar just goals |
|---|
| 0:03:02 | for the learning of the metrics |
|---|
| 0:03:05 | we |
|---|
| 0:03:07 | employed a loss function best on the optimisation of the actual use the as displayed |
|---|
| 0:03:13 | in select speaker of this slide |
|---|
| 0:03:22 | follows them actually learning i thing the first other one g h e that's the |
|---|
| 0:03:27 | challenge of as a distance function is a consistent with the evaluation procedure |
|---|
| 0:03:33 | therefore it back into can directly optimize the |
|---|
| 0:03:38 | tom evaluation metrics the for speaker verification |
|---|
| 0:03:42 | such as the equal the rats the life use the |
|---|
| 0:03:47 | and style |
|---|
| 0:03:50 | thank and eat can be easily combined to these |
|---|
| 0:03:56 | accents front ends for them both the i-vector of the x better |
|---|
| 0:04:04 | third this channel matched learn a matter that can be easily extended to choose the |
|---|
| 0:04:09 | and to and the pram work |
|---|
| 0:04:18 | the second requesting i needed to uncertainties |
|---|
| 0:04:21 | what is the partial a use the |
|---|
| 0:04:24 | and the |
|---|
| 0:04:25 | why was them metric learning back end aims at its optimising |
|---|
| 0:04:30 | actually use the |
|---|
| 0:04:38 | in the |
|---|
| 0:04:39 | left finger of this slide |
|---|
| 0:04:42 | the power to use the divine and or small part of what re on there |
|---|
| 0:04:47 | is a all c call |
|---|
| 0:04:49 | like |
|---|
| 0:04:50 | this correct re |
|---|
| 0:04:53 | vol |
|---|
| 0:04:54 | the metric learning can directly optimize thumb evaluation metrics |
|---|
| 0:04:59 | its implementation fess these |
|---|
| 0:05:01 | some difficulties |
|---|
| 0:05:05 | as we all know |
|---|
| 0:05:06 | we needed to "'cause" tried to peer wise all triple edge chanting trials with speaker-level |
|---|
| 0:05:11 | labels to change is this function |
|---|
| 0:05:15 | in matched learning |
|---|
| 0:05:17 | in this edition |
|---|
| 0:05:19 | the number of all possible training trials |
|---|
| 0:05:22 | e is very large |
|---|
| 0:05:24 | besides many easily distinguishable channels unnecessary to the challenge of the distance function |
|---|
| 0:05:32 | in terms of these difficulties |
|---|
| 0:05:35 | i think |
|---|
| 0:05:36 | the optimisation of the pa use the has the |
|---|
| 0:05:40 | pointing to the ones you jeez |
|---|
| 0:05:44 | first |
|---|
| 0:05:45 | it is easy to select the difficulty samples by cindy a two |
|---|
| 0:05:51 | the overall |
|---|
| 0:05:54 | and the we'd have to |
|---|
| 0:05:58 | relative small value |
|---|
| 0:06:00 | in this to be |
|---|
| 0:06:01 | we can also progress the number of the |
|---|
| 0:06:05 | ct of the training trials |
|---|
| 0:06:08 | second we can optimize them interested the partial use the according to some specific applications |
|---|
| 0:06:16 | and obviously |
|---|
| 0:06:17 | a to z is a special case of partial using |
|---|
| 0:06:27 | next |
|---|
| 0:06:28 | in the centre part of your express the bedding comparing the impulse of the proposed |
|---|
| 0:06:34 | algorithm |
|---|
| 0:06:44 | in this slide i will introduce the whole to calculate to the partial use the |
|---|
| 0:06:51 | and i health and metric learning need to construct pairwise trials |
|---|
| 0:06:57 | here we don't see the whole to construct them |
|---|
| 0:07:00 | and the be the in that |
|---|
| 0:07:03 | t is an hour a day constructed this there'd |
|---|
| 0:07:06 | here x and y n |
|---|
| 0:07:10 | speaker features over two speech segments |
|---|
| 0:07:14 | our is the year round to choose level |
|---|
| 0:07:17 | you they come from of them speaker |
|---|
| 0:07:19 | l a equal one |
|---|
| 0:07:21 | otherwise i l and you quote the are able |
|---|
| 0:07:26 | besides the function of s |
|---|
| 0:07:29 | is use the to calculate the similarity |
|---|
| 0:07:32 | of two speaker features |
|---|
| 0:07:35 | here we used to the mahalanobis distance function |
|---|
| 0:07:40 | no creativity the level l had can be obtained by a comparison of the distances |
|---|
| 0:07:47 | calls |
|---|
| 0:07:48 | as a |
|---|
| 0:07:50 | and the is the threshold receiver |
|---|
| 0:07:55 | given a fixed the value of the hot we i about to compute to posterior |
|---|
| 0:08:00 | at t p r |
|---|
| 0:08:04 | and to |
|---|
| 0:08:07 | post |
|---|
| 0:08:08 | positive rats f p r |
|---|
| 0:08:13 | boundary of the hobby can get a theories o t p and the f b |
|---|
| 0:08:18 | r |
|---|
| 0:08:19 | which one |
|---|
| 0:08:20 | and are of the call |
|---|
| 0:08:22 | and the role in the speaker |
|---|
| 0:08:27 | and to really optimize the entire |
|---|
| 0:08:32 | optimising the optimize the entire roc call if an actual follows |
|---|
| 0:08:39 | you were this is not only costly but also unnecessary |
|---|
| 0:08:43 | because in most practical system |
|---|
| 0:08:46 | work |
|---|
| 0:08:47 | and only practical |
|---|
| 0:08:52 | because the most of practical systems |
|---|
| 0:08:55 | work |
|---|
| 0:08:56 | and the part of their our roc curves |
|---|
| 0:09:02 | walking them whole |
|---|
| 0:09:04 | back security system you're leave equalized smaller force posterior rats |
|---|
| 0:09:09 | in contrast |
|---|
| 0:09:11 | terrorist the detector system always hopes |
|---|
| 0:09:15 | we in |
|---|
| 0:09:16 | hyper record react |
|---|
| 0:09:21 | so without optimize the partial use the your the walk imports look at it is |
|---|
| 0:09:27 | a better choice |
|---|
| 0:09:32 | in this light |
|---|
| 0:09:34 | t even though constructed up here was trained if that's |
|---|
| 0:09:38 | key and do a |
|---|
| 0:09:41 | the positive and negative subset of t |
|---|
| 0:09:45 | then be needed to compute a new stuff that and the or |
|---|
| 0:09:52 | vol |
|---|
| 0:09:54 | from by eating that they'll |
|---|
| 0:09:57 | can stress of that's the value of p r is peachy |
|---|
| 0:10:03 | are far and the beta |
|---|
| 0:10:05 | you order to compute and the oral we first needed to thank you lance our |
|---|
| 0:10:10 | and the be higher but this formula |
|---|
| 0:10:15 | then all this values of connectives that |
|---|
| 0:10:22 | so |
|---|
| 0:10:23 | sorted in ascending order |
|---|
| 0:10:25 | and then e |
|---|
| 0:10:26 | and the overall he's is selected as a subset of the samples under the problems |
|---|
| 0:10:31 | at all |
|---|
| 0:10:33 | i was to be fast position of the result you discourse |
|---|
| 0:10:39 | after obtaining the overall |
|---|
| 0:10:42 | p a use the can be calculated and all |
|---|
| 0:10:45 | normalized |
|---|
| 0:10:46 | it was the |
|---|
| 0:10:48 | or p |
|---|
| 0:10:49 | and the and they are well |
|---|
| 0:10:57 | in respectively |
|---|
| 0:11:00 | the partial if the is calculated by they'll |
|---|
| 0:11:04 | that can a full meal or |
|---|
| 0:11:05 | of this light |
|---|
| 0:11:07 | you |
|---|
| 0:11:09 | all i |
|---|
| 0:11:10 | is an indicator function so directory optimising this formula is np-hard therefore we needed to |
|---|
| 0:11:17 | relax eight in the best if agree |
|---|
| 0:11:20 | elias there's no |
|---|
| 0:11:22 | here use the calculation function by replacing the indicator function v is a huge loss |
|---|
| 0:11:28 | function |
|---|
| 0:11:32 | here |
|---|
| 0:11:33 | third time is eligible hyper parameter and the it is larger than the oral |
|---|
| 0:11:40 | the |
|---|
| 0:11:41 | last from lord give of the relaxed the loss function |
|---|
| 0:11:48 | to prevent |
|---|
| 0:11:50 | it to bremen to this |
|---|
| 0:11:53 | loss function |
|---|
| 0:11:54 | or feed into the training data be also |
|---|
| 0:11:57 | indeed regular |
|---|
| 0:11:59 | not addition term |
|---|
| 0:12:01 | the land that all mean a |
|---|
| 0:12:04 | to the minimization problem |
|---|
| 0:12:08 | finally |
|---|
| 0:12:09 | this green part in large as the between-class distance |
|---|
| 0:12:13 | and this read the patch |
|---|
| 0:12:16 | try to minimize no between-class variance |
|---|
| 0:12:19 | in awards our objective function ends |
|---|
| 0:12:23 | and |
|---|
| 0:12:24 | enlarging of each he'd marketing |
|---|
| 0:12:26 | been to use the |
|---|
| 0:12:28 | pasta you and in |
|---|
| 0:12:29 | negative trials by minimizing they'll sitting at the various |
|---|
| 0:12:35 | of the two colours trials simultaneously |
|---|
| 0:12:42 | in the third part i go give some experimental results |
|---|
| 0:12:47 | this lighted display our experimental it's easiness |
|---|
| 0:12:52 | more details can be bounded in the paper |
|---|
| 0:12:57 | this paper |
|---|
| 0:12:58 | this table lists no comparison results on the conscience that's the data set |
|---|
| 0:13:07 | it is then that's of the proposed |
|---|
| 0:13:10 | pa use them actually it's better performance than p lda |
|---|
| 0:13:14 | given both the i-vector and the expected front ends |
|---|
| 0:13:19 | specifically the pac p a using them actually over ten s |
|---|
| 0:13:24 | not persons and to twenty percent relative improvement over p lda |
|---|
| 0:13:30 | in terms of the |
|---|
| 0:13:32 | pa use the and it was the |
|---|
| 0:13:34 | actually |
|---|
| 0:13:38 | respectively |
|---|
| 0:13:40 | no worry |
|---|
| 0:13:41 | it achieves models that eleven percent relative eer reduction |
|---|
| 0:13:49 | and five percent |
|---|
| 0:13:51 | relative this the effort reduction over p lda |
|---|
| 0:13:58 | table two at least the results on the core task |
|---|
| 0:14:03 | the s i t w data that is that |
|---|
| 0:14:05 | it is thing that's the problem lost |
|---|
| 0:14:08 | p a using matching it's better performance than p lda |
|---|
| 0:14:13 | specifically but the x factor front and is used |
|---|
| 0:14:18 | pa using matching achieve some of them |
|---|
| 0:14:21 | eight percent |
|---|
| 0:14:23 | relative pa use the |
|---|
| 0:14:25 | an improvement all work p l d a t |
|---|
| 0:14:28 | if the |
|---|
| 0:14:29 | it is also |
|---|
| 0:14:30 | of a tent |
|---|
| 0:14:31 | no then |
|---|
| 0:14:32 | twenty percent and the channel or since about it you a was the improvements on |
|---|
| 0:14:37 | the development and evaluation call tasks respectively |
|---|
| 0:14:43 | moreover it achieves |
|---|
| 0:14:46 | ten percent relative eer reduction and the |
|---|
| 0:14:49 | three percent relative dcf |
|---|
| 0:14:52 | reduction |
|---|
| 0:14:53 | although the performance improvement to be though |
|---|
| 0:14:56 | i-vector front end is not still significant |
|---|
| 0:15:00 | and that the extract a front end |
|---|
| 0:15:03 | the tense with different a front ends are consistent |
|---|
| 0:15:10 | this page displayed as some experimental results |
|---|
| 0:15:15 | bid i use the two analysis the if at all hyper parameters hopefulness |
|---|
| 0:15:21 | we adopt e d |
|---|
| 0:15:22 | read the source to study the impact of the values of common enemy performance |
|---|
| 0:15:29 | in the |
|---|
| 0:15:31 | a vector |
|---|
| 0:15:32 | yes |
|---|
| 0:15:33 | from these two tables bank and the data does double working region is quite large |
|---|
| 0:15:42 | this fink or souls the relative performance improvements all work p lda |
|---|
| 0:15:48 | in terms of the difference |
|---|
| 0:15:50 | of different adored |
|---|
| 0:15:52 | in the objective function |
|---|
| 0:15:55 | from this finger be fine this dances the pa use them actually is a robust |
|---|
| 0:16:01 | e o by the advantage of is the best value around do one point two |
|---|
| 0:16:05 | five |
|---|
| 0:16:14 | finally |
|---|
| 0:16:15 | i will give some conclusions and the introduced several for the works as you and |
|---|
| 0:16:21 | of our future plans |
|---|
| 0:16:33 | in this paper |
|---|
| 0:16:35 | mahalanobis distance past them magical learning back end is proposed to optimize partial a use |
|---|
| 0:16:42 | the both speaker verification |
|---|
| 0:16:47 | because directly optimize thing |
|---|
| 0:16:50 | partial you the at and b heart |
|---|
| 0:16:53 | be relaxed aid by a huge loss function |
|---|
| 0:16:56 | experimental results |
|---|
| 0:16:58 | carried out of the |
|---|
| 0:17:00 | nist is a risky and data |
|---|
| 0:17:02 | s i t w that have that's |
|---|
| 0:17:05 | that must just as the effectiveness of our proposed algorithm |
|---|
| 0:17:14 | after this work we also mad the general done normalization |
|---|
| 0:17:20 | and to compress the analysis |
|---|
| 0:17:23 | to the pac metric |
|---|
| 0:17:27 | we show me |
|---|
| 0:17:29 | published as the |
|---|
| 0:17:30 | without relative without |
|---|
| 0:17:33 | in this paper |
|---|
| 0:17:37 | besides |
|---|
| 0:17:38 | we also extended the extended to the |
|---|
| 0:17:42 | pa is the magic to an energy and the framework |
|---|
| 0:17:51 | more information can be found in this too |
|---|
| 0:17:54 | more information can be found in this paper |
|---|
| 0:18:00 | in the theatre |
|---|
| 0:18:03 | maybe all research more general mexican and best the speaker verification or rhythm |
|---|
| 0:18:08 | to optimize |
|---|
| 0:18:11 | evaluation metrics |
|---|
| 0:18:13 | in order to |
|---|
| 0:18:15 | further improve speaker verification performance |
|---|
| 0:18:23 | that all from my presentation |
|---|
| 0:18:26 | thank you for your watching |
|---|