| 0:00:14 | i variable |
|---|
| 0:00:16 | the to have we really fair use the i per speaker characterization using key and |
|---|
| 0:00:22 | then |
|---|
| 0:00:25 | sure there's |
|---|
| 0:00:27 | speaker i four nist sre two so |
|---|
| 0:00:31 | the right |
|---|
| 0:00:33 | my and gently one |
|---|
| 0:00:38 | basically nine |
|---|
| 0:00:40 | first we like that you a large |
|---|
| 0:00:45 | my boss range |
|---|
| 0:00:47 | and that we use the used a |
|---|
| 0:00:50 | five |
|---|
| 0:00:51 | and we the tree |
|---|
| 0:00:56 | about the punch |
|---|
| 0:00:58 | the network based speaker dataset |
|---|
| 0:01:02 | and three demonstrate very what also |
|---|
| 0:01:06 | and |
|---|
| 0:01:07 | because the mainstream mixture |
|---|
| 0:01:10 | different |
|---|
| 0:01:11 | i for one thing works structure what |
|---|
| 0:01:14 | oops |
|---|
| 0:01:15 | such as convolution one they work |
|---|
| 0:01:21 | i did you walk |
|---|
| 0:01:23 | here |
|---|
| 0:01:26 | the lowest eer |
|---|
| 0:01:27 | a vectorized that's or |
|---|
| 0:01:31 | in speaker baiting |
|---|
| 0:01:34 | area cordoned sartre soccer but it to freeze |
|---|
| 0:01:38 | a pension |
|---|
| 0:01:39 | in that picture |
|---|
| 0:01:41 | so |
|---|
| 0:01:42 | we can is t |
|---|
| 0:01:44 | use a better to better talk of these two to speaker recognition |
|---|
| 0:01:52 | this paper is process speaker characterization |
|---|
| 0:01:56 | using active they only work |
|---|
| 0:01:58 | don't |
|---|
| 0:01:59 | sure that |
|---|
| 0:02:00 | then we don't work a protection a call at a robust protection |
|---|
| 0:02:06 | the speaker |
|---|
| 0:02:10 | and the |
|---|
| 0:02:12 | well |
|---|
| 0:02:13 | right dependability |
|---|
| 0:02:15 | used |
|---|
| 0:02:17 | is are |
|---|
| 0:02:18 | the variation that that's the |
|---|
| 0:02:21 | the next baseline if the park on speaker recognition evaluation |
|---|
| 0:02:27 | kentucky by the |
|---|
| 0:02:29 | you first nation on thirty two hours passed |
|---|
| 0:02:32 | and there are large |
|---|
| 0:02:34 | since |
|---|
| 0:02:35 | nineteen ninety six |
|---|
| 0:02:40 | for real application different sure i'm sorry features |
|---|
| 0:02:46 | but what |
|---|
| 0:02:47 | right feature |
|---|
| 0:02:48 | it makes the speech |
|---|
| 0:02:51 | the nist sre ten show |
|---|
| 0:02:58 | i will take years but wasn't makes the |
|---|
| 0:03:03 | mastery power |
|---|
| 0:03:05 | right proposed the first neural network based |
|---|
| 0:03:09 | speaker weighting |
|---|
| 0:03:11 | i also has brought before |
|---|
| 0:03:15 | feature errors |
|---|
| 0:03:17 | final by a couple of its the |
|---|
| 0:03:24 | no milk based speaker eight |
|---|
| 0:03:28 | is the |
|---|
| 0:03:29 | mainstream or coded |
|---|
| 0:03:32 | speaker recognition |
|---|
| 0:03:34 | and thus |
|---|
| 0:03:36 | first speaker |
|---|
| 0:03:37 | speaker mister a |
|---|
| 0:03:40 | t you know based structure |
|---|
| 0:03:43 | you know network structure |
|---|
| 0:03:46 | for |
|---|
| 0:03:47 | two part |
|---|
| 0:03:48 | first |
|---|
| 0:03:49 | the speech you will be cost |
|---|
| 0:03:53 | for label |
|---|
| 0:03:55 | representation |
|---|
| 0:03:56 | followed by rocks the |
|---|
| 0:03:58 | these tickle forty |
|---|
| 0:04:02 | been |
|---|
| 0:04:03 | there are two |
|---|
| 0:04:04 | second but |
|---|
| 0:04:05 | therefore |
|---|
| 0:04:07 | tends to who |
|---|
| 0:04:10 | and you're we |
|---|
| 0:04:12 | is true first |
|---|
| 0:04:13 | there |
|---|
| 0:04:14 | the combined than for others |
|---|
| 0:04:17 | speaker very |
|---|
| 0:04:20 | in this study |
|---|
| 0:04:22 | i for the |
|---|
| 0:04:23 | well |
|---|
| 0:04:25 | we praise |
|---|
| 0:04:25 | the |
|---|
| 0:04:26 | second it so there |
|---|
| 0:04:28 | you can with their |
|---|
| 0:04:30 | robust they're |
|---|
| 0:04:31 | according to |
|---|
| 0:04:36 | work |
|---|
| 0:04:37 | structure |
|---|
| 0:04:41 | in addition |
|---|
| 0:04:43 | i also used |
|---|
| 0:04:46 | attention there too |
|---|
| 0:04:48 | you're right |
|---|
| 0:04:50 | the statistical put it there |
|---|
| 0:04:53 | accordingly |
|---|
| 0:04:55 | what structure press at the receiver tension |
|---|
| 0:05:00 | speaker but |
|---|
| 0:05:07 | in this study |
|---|
| 0:05:09 | but i australian feature extraction are |
|---|
| 0:05:13 | based k to find a good features |
|---|
| 0:05:15 | for speaker rate |
|---|
| 0:05:18 | through acoustic features there are trendy for all go far |
|---|
| 0:05:22 | the first male frequency catch a quite feature |
|---|
| 0:05:27 | i cory and three |
|---|
| 0:05:29 | basically |
|---|
| 0:05:30 | okay recognition |
|---|
| 0:05:33 | you know |
|---|
| 0:05:36 | the service |
|---|
| 0:05:37 | mel-scale filter be attach with each accordingly |
|---|
| 0:05:42 | p |
|---|
| 0:05:46 | to me |
|---|
| 0:05:47 | could be well it backwards with your check |
|---|
| 0:05:51 | for kind of data local station |
|---|
| 0:05:54 | are used |
|---|
| 0:05:55 | took it seven |
|---|
| 0:05:56 | you cultural for each of the top |
|---|
| 0:06:00 | the you're saying and data points that if the |
|---|
| 0:06:03 | current to wrap |
|---|
| 0:06:05 | the original audio file |
|---|
| 0:06:07 | which each but between |
|---|
| 0:06:10 | no |
|---|
| 0:06:12 | utterance |
|---|
| 0:06:14 | no problems |
|---|
| 0:06:18 | in this thing |
|---|
| 0:06:20 | is the simulated impulse response |
|---|
| 0:06:24 | i used to cover all reaching or |
|---|
| 0:06:27 | right column |
|---|
| 0:06:29 | okay |
|---|
| 0:06:31 | right in aspects problems |
|---|
| 0:06:34 | so it |
|---|
| 0:06:35 | speech vision |
|---|
| 0:06:38 | try to one for speech |
|---|
| 0:06:40 | two |
|---|
| 0:06:41 | like that's |
|---|
| 0:06:44 | well just as a |
|---|
| 0:06:46 | original reach |
|---|
| 0:06:49 | the last |
|---|
| 0:06:50 | the that you a patient |
|---|
| 0:06:52 | original |
|---|
| 0:06:53 | what if i |
|---|
| 0:06:55 | gail |
|---|
| 0:06:56 | which the training data |
|---|
| 0:06:58 | very approach or four |
|---|
| 0:07:00 | but you advantage future or right |
|---|
| 0:07:04 | by using |
|---|
| 0:07:06 | such for kernel in addition |
|---|
| 0:07:10 | there are |
|---|
| 0:07:11 | seven corpus |
|---|
| 0:07:14 | origin |
|---|
| 0:07:14 | that are it |
|---|
| 0:07:22 | thus are train artificial |
|---|
| 0:07:26 | instead |
|---|
| 0:07:27 | nist sre |
|---|
| 0:07:29 | switchboard |
|---|
| 0:07:30 | bonastre |
|---|
| 0:07:31 | it aspect |
|---|
| 0:07:33 | that was therefore it after |
|---|
| 0:07:35 | do correctly for |
|---|
| 0:07:37 | q |
|---|
| 0:07:38 | we should okay first and sit |
|---|
| 0:07:42 | i for one clean speech |
|---|
| 0:07:45 | for our molding |
|---|
| 0:07:48 | one utterances from eighty six summon speaker |
|---|
| 0:07:52 | but i |
|---|
| 0:07:54 | it's a huge amount of it |
|---|
| 0:07:59 | well you material should it also nist sre sound and eight |
|---|
| 0:08:04 | it is i two so that night in a heartbeat |
|---|
| 0:08:09 | the most |
|---|
| 0:08:10 | available training data which |
|---|
| 0:08:13 | because the state yes |
|---|
| 0:08:16 | it can be expressed are all speech |
|---|
| 0:08:18 | you know in speech |
|---|
| 0:08:21 | only |
|---|
| 0:08:21 | well do you or but to me but |
|---|
| 0:08:26 | and i |
|---|
| 0:08:27 | so |
|---|
| 0:08:28 | it for me for feature extraction |
|---|
| 0:08:34 | right we are sure |
|---|
| 0:08:40 | a couple minutes the i it weights |
|---|
| 0:08:43 | there |
|---|
| 0:08:43 | national institute of standards |
|---|
| 0:08:46 | and technology matched speaker recognition evaluation task |
|---|
| 0:08:52 | sre it was sort of a start to that night |
|---|
| 0:08:59 | experimental results showed that the cost structure their decision cost function |
|---|
| 0:09:07 | well the |
|---|
| 0:09:08 | going segment |
|---|
| 0:09:09 | two |
|---|
| 0:09:10 | and |
|---|
| 0:09:10 | zero point |
|---|
| 0:09:13 | see |
|---|
| 0:09:13 | right |
|---|
| 0:09:14 | two |
|---|
| 0:09:17 | which the nist |
|---|
| 0:09:18 | this idea to start it |
|---|
| 0:09:21 | and decide to a nightly evaluation it has the respectively |
|---|
| 0:09:30 | this figure this table |
|---|
| 0:09:33 | chaudhari |
|---|
| 0:09:36 | well allows you know that |
|---|
| 0:09:39 | the best performance |
|---|
| 0:09:42 | there are fixed |
|---|
| 0:09:46 | i compare the first and second |
|---|
| 0:09:50 | segment variable speed but it |
|---|
| 0:09:53 | they also come |
|---|
| 0:09:55 | see if a feature |
|---|
| 0:09:58 | well all we can |
|---|
| 0:10:00 | fun |
|---|
| 0:10:02 | filled up in |
|---|
| 0:10:03 | these each feature |
|---|
| 0:10:06 | awful |
|---|
| 0:10:06 | you know this the feature |
|---|
| 0:10:11 | we also |
|---|
| 0:10:14 | so i |
|---|
| 0:10:15 | the first |
|---|
| 0:10:16 | segment i |
|---|
| 0:10:18 | speaker big be weighted a second |
|---|
| 0:10:22 | the speaker but something the second their speaker at |
|---|
| 0:10:29 | for the first their speaker |
|---|
| 0:10:32 | i |
|---|
| 0:10:34 | result |
|---|
| 0:10:35 | so |
|---|
| 0:10:36 | i think |
|---|
| 0:10:37 | both the speaker |
|---|
| 0:10:40 | first bears a bit |
|---|
| 0:10:42 | they for dimension of the image |
|---|
| 0:10:46 | we can use the score fusion |
|---|
| 0:10:49 | okay vector itself |
|---|
| 0:10:58 | since |
|---|
| 0:10:59 | i file |
|---|
| 0:11:00 | filter bank feature was a feature vector function |
|---|
| 0:11:07 | and also be noted that the cost fifty and draws attention c |
|---|
| 0:11:12 | and eighty dollars |
|---|
| 0:11:15 | we so what role |
|---|
| 0:11:20 | extent also mention i'll for sure |
|---|
| 0:11:23 | the next frame |
|---|
| 0:11:24 | therefore it should |
|---|
| 0:11:28 | what are trained based on |
|---|
| 0:11:30 | the pen |
|---|
| 0:11:32 | each feature |
|---|
| 0:11:34 | this type of show |
|---|
| 0:11:36 | we can find |
|---|
| 0:11:37 | by using white |
|---|
| 0:11:39 | role |
|---|
| 0:11:40 | for in |
|---|
| 0:11:41 | they will refer to ensure |
|---|
| 0:11:43 | we can pick the performance |
|---|
| 0:11:51 | finally |
|---|
| 0:11:52 | well so that all call |
|---|
| 0:11:55 | and ninety six and |
|---|
| 0:11:57 | by using expensive but it is that file and feature and then it is the |
|---|
| 0:12:02 | back and scoring |
|---|
| 0:12:05 | why final submission |
|---|
| 0:12:08 | that is |
|---|
| 0:12:10 | where it is |
|---|
| 0:12:12 | much |
|---|
| 0:12:14 | each year suspension |
|---|
| 0:12:17 | bic we wish the |
|---|
| 0:12:19 | so q two |
|---|
| 0:12:21 | one two cards |
|---|
| 0:12:24 | once your feet it's |
|---|
| 0:12:28 | do you got but not for right |
|---|
| 0:12:33 | for |
|---|
| 0:12:35 | pretty much are you |
|---|
| 0:12:38 | this table show |
|---|
| 0:12:40 | by the final file for this site tools on it |
|---|
| 0:12:44 | it is i thought it right |
|---|
| 0:12:48 | you deterioration |
|---|
| 0:12:55 | that we show that a portion |
|---|
| 0:13:01 | this paper to use that system so |
|---|
| 0:13:04 | to a |
|---|
| 0:13:05 | next slide so that night |
|---|
| 0:13:08 | ct has task |
|---|
| 0:13:09 | i'm scroll neural network |
|---|
| 0:13:12 | structure |
|---|
| 0:13:13 | which operates on india and at a at least |
|---|
| 0:13:17 | and you know extra tight shot |
|---|
| 0:13:20 | it showed up and have your |
|---|
| 0:13:23 | and you may speak at |
|---|
| 0:13:24 | there and sixty you know the lp and feature analysis |
|---|
| 0:13:30 | i used |
|---|
| 0:13:32 | channel that's k |
|---|
| 0:13:33 | we did |
|---|
| 0:13:34 | feature |
|---|
| 0:13:36 | mixer six sre |
|---|
| 0:13:38 | so which what a watch therefore |
|---|
| 0:13:41 | that one |
|---|
| 0:13:42 | be a huge |
|---|
| 0:13:44 | six |
|---|
| 0:13:46 | no prior for |
|---|
| 0:13:48 | because our compensation is that what we |
|---|
| 0:13:52 | be well in that the of available training there |
|---|
| 0:13:57 | the proposed mixer shooter it should |
|---|
| 0:14:01 | this year |
|---|
| 0:14:02 | score |
|---|
| 0:14:03 | you or initially suitable for |
|---|
| 0:14:07 | to zero |
|---|
| 0:14:09 | contrary nine five |
|---|
| 0:14:12 | the |
|---|
| 0:14:12 | next |
|---|
| 0:14:13 | this idea to start at sre two thousand nine that the original dataset back |
|---|
| 0:14:22 | thank you |
|---|
| 0:14:23 | thank you very much |
|---|