| 0:00:14 | and really everyone my name is strong and you from university was so |
|---|
| 0:00:19 | today |
|---|
| 0:00:19 | i will talk about |
|---|
| 0:00:21 | wait spectral time flipped speech signals for all those this whooping detection |
|---|
| 0:00:28 | forest |
|---|
| 0:00:28 | well let me introduce would be indication for automatic speaker verification |
|---|
| 0:00:36 | nist |
|---|
| 0:00:37 | which is sure |
|---|
| 0:00:38 | for |
|---|
| 0:00:38 | automatic speaker verification |
|---|
| 0:00:41 | as well reliability |
|---|
| 0:00:44 | a little level to a s p means |
|---|
| 0:00:47 | you is |
|---|
| 0:00:48 | or in the remote swooping on x |
|---|
| 0:00:52 | that's pool being a take is somewhat tend to this u k if this is |
|---|
| 0:00:56 | okay we soap opera six |
|---|
| 0:01:00 | this book authors ease you are close artificially produced for sounded like the target speakers |
|---|
| 0:01:07 | are press |
|---|
| 0:01:09 | so |
|---|
| 0:01:10 | the impostor speaker |
|---|
| 0:01:11 | who of pent |
|---|
| 0:01:13 | pooping okay |
|---|
| 0:01:14 | can be both that the yes the target speaker |
|---|
| 0:01:20 | there are some types of us will be not x |
|---|
| 0:01:22 | each can be actually detected |
|---|
| 0:01:26 | text to speech sympathies |
|---|
| 0:01:28 | with conversion |
|---|
| 0:01:30 | and |
|---|
| 0:01:31 | we like okay |
|---|
| 0:01:35 | food being detection is that okay put distinguish |
|---|
| 0:01:39 | right are given a cross he's |
|---|
| 0:01:41 | genuine authors |
|---|
| 0:01:43 | or soap operas |
|---|
| 0:01:46 | you identity claim |
|---|
| 0:01:48 | we spoke authors is exactly the |
|---|
| 0:01:52 | we gotta level |
|---|
| 0:01:54 | how similar those who are classic he's put target speakers utterance |
|---|
| 0:02:00 | therefore |
|---|
| 0:02:01 | whooping detection can protect it is system |
|---|
| 0:02:05 | okay |
|---|
| 0:02:07 | various to being a tax |
|---|
| 0:02:11 | work attacking spoofing attacks |
|---|
| 0:02:13 | we should capture the differences of the frequency response well |
|---|
| 0:02:17 | as shown in this figure |
|---|
| 0:02:20 | the frequency responses |
|---|
| 0:02:22 | between training utterance as food utterance |
|---|
| 0:02:27 | are different |
|---|
| 0:02:28 | for example |
|---|
| 0:02:30 | spoof utterances produced by likely okay |
|---|
| 0:02:33 | contain the attribute |
|---|
| 0:02:35 | of the device e |
|---|
| 0:02:36 | used for the league playoff k |
|---|
| 0:02:38 | such as quite a device |
|---|
| 0:02:40 | and the recording device |
|---|
| 0:02:42 | also to put the utterances |
|---|
| 0:02:44 | produced by speech synthesis and ways combos ms source |
|---|
| 0:02:48 | do not contain the proper dynamic information and the phase information of genuine utterances |
|---|
| 0:02:56 | many researchers |
|---|
| 0:02:58 | convolutional neural networks |
|---|
| 0:03:00 | have been used to capture every available for frequency responses |
|---|
| 0:03:04 | in spectrum based acoustic features |
|---|
| 0:03:11 | as a side note |
|---|
| 0:03:13 | color describe about the spectrum of each signal gleefully |
|---|
| 0:03:18 | the spectrum of speech signal |
|---|
| 0:03:21 | use |
|---|
| 0:03:23 | consistently well |
|---|
| 0:03:24 | two kinds of spectrum |
|---|
| 0:03:26 | one is magnitude spectrum |
|---|
| 0:03:29 | and the other this phase spectrum |
|---|
| 0:03:33 | men into spectrum pace the features have been widely used for sweeping kick action |
|---|
| 0:03:40 | there are some kinds of vanity the spectrum based features |
|---|
| 0:03:44 | such as low power spectrum |
|---|
| 0:03:48 | constant q cepstral coefficients |
|---|
| 0:03:51 | linear frequency cepstral coefficients |
|---|
| 0:03:54 | and so on |
|---|
| 0:03:57 | we are is |
|---|
| 0:03:58 | phase spectrum based the features in less used then |
|---|
| 0:04:02 | and into the spectrum based features |
|---|
| 0:04:06 | well |
|---|
| 0:04:07 | the phase spectrum based features |
|---|
| 0:04:09 | contain |
|---|
| 0:04:10 | useful information for swooping detection |
|---|
| 0:04:13 | there is not contained in many to spectrum |
|---|
| 0:04:17 | in our research |
|---|
| 0:04:19 | we focused on phase spectrum |
|---|
| 0:04:21 | especially |
|---|
| 0:04:23 | we used |
|---|
| 0:04:24 | group delay |
|---|
| 0:04:26 | as of phase spectrum based feature |
|---|
| 0:04:28 | the group delay d is defined |
|---|
| 0:04:31 | yes |
|---|
| 0:04:31 | these you creation |
|---|
| 0:04:35 | in this section also introduce our proposed in this so |
|---|
| 0:04:40 | forest are explainable |
|---|
| 0:04:42 | hi flicking for what people's vector |
|---|
| 0:04:46 | managed to that spectrum is not affected by the time order of the signal |
|---|
| 0:04:52 | so |
|---|
| 0:04:53 | the manager spectrum will the will of |
|---|
| 0:04:55 | original signal and pamphlet signal |
|---|
| 0:04:59 | are the same |
|---|
| 0:05:00 | however |
|---|
| 0:05:01 | of phase spectrum used changed |
|---|
| 0:05:04 | when the time order of the signal peacefully |
|---|
| 0:05:07 | it means that |
|---|
| 0:05:09 | you attributes although phase spectrum are changed |
|---|
| 0:05:12 | when the time or notable c or not he's fully |
|---|
| 0:05:16 | based on this fact |
|---|
| 0:05:19 | we also when the time or total the signal is related |
|---|
| 0:05:24 | you identities are not related to spoofing attacks |
|---|
| 0:05:29 | such as language information and |
|---|
| 0:05:32 | right information |
|---|
| 0:05:35 | are changed |
|---|
| 0:05:37 | in contrast |
|---|
| 0:05:39 | you identities |
|---|
| 0:05:40 | that are related to spoofing attacks |
|---|
| 0:05:43 | such as well i victimise information and the recording device information |
|---|
| 0:05:48 | are not changed |
|---|
| 0:05:51 | motivated by these of function |
|---|
| 0:05:55 | we proposed a mess sold |
|---|
| 0:05:57 | using |
|---|
| 0:05:58 | two types of phase spectrum based features to get |
|---|
| 0:06:03 | on to now |
|---|
| 0:06:05 | combination as will be in contention systems |
|---|
| 0:06:07 | have used of a spectrum based features |
|---|
| 0:06:10 | from the original signal only |
|---|
| 0:06:13 | in our research we use |
|---|
| 0:06:15 | not only eight of phase spectrum based feature from the original signal all also |
|---|
| 0:06:21 | of feature |
|---|
| 0:06:23 | from the pine flip signal |
|---|
| 0:06:28 | if a raw some holes |
|---|
| 0:06:31 | we can generate |
|---|
| 0:06:33 | new speech signals |
|---|
| 0:06:35 | have on seen in fact live conditions |
|---|
| 0:06:39 | by using the proposed method |
|---|
| 0:06:42 | and |
|---|
| 0:06:44 | use all both |
|---|
| 0:06:45 | i think than others |
|---|
| 0:06:46 | as you effect well we do seen in fact that variance more efficiently |
|---|
| 0:06:53 | which is are sitting |
|---|
| 0:06:54 | or promising improvements |
|---|
| 0:06:58 | by using two types of features at one time |
|---|
| 0:07:02 | we propose those three kinds of feature combination methods |
|---|
| 0:07:07 | before introducing the feature combination methods |
|---|
| 0:07:11 | are we introduce our baseline |
|---|
| 0:07:13 | the end of base model or just |
|---|
| 0:07:19 | of course you can use any kinds of c n based models |
|---|
| 0:07:23 | and you a in our research |
|---|
| 0:07:25 | we used |
|---|
| 0:07:26 | s here is necessary for |
|---|
| 0:07:28 | after the nn based model |
|---|
| 0:07:32 | as it is necessary for |
|---|
| 0:07:33 | is the fashion police now |
|---|
| 0:07:35 | where |
|---|
| 0:07:36 | s c blocks are integrity into each residual raw |
|---|
| 0:07:41 | only calibrating |
|---|
| 0:07:43 | channelwise responses |
|---|
| 0:07:45 | and as it is necessary for was high rank in a space poop at nineteen |
|---|
| 0:07:50 | challenge |
|---|
| 0:07:55 | one combination mess so |
|---|
| 0:07:57 | is |
|---|
| 0:07:58 | two channel amp |
|---|
| 0:08:00 | where |
|---|
| 0:08:01 | two types of features |
|---|
| 0:08:03 | home ceased well |
|---|
| 0:08:04 | one improve |
|---|
| 0:08:07 | another combination muscled he's embedding level combination |
|---|
| 0:08:13 | the embedding |
|---|
| 0:08:17 | corresponds to |
|---|
| 0:08:17 | all these are still global average probably |
|---|
| 0:08:23 | is met so that can be divided into three missiles |
|---|
| 0:08:27 | the first pass of his |
|---|
| 0:08:29 | concatenate to embedding |
|---|
| 0:08:33 | to make up one emitting vector |
|---|
| 0:08:36 | the second method used to compute a learned a lot of maximum hope to embedding |
|---|
| 0:08:43 | the sort method used to compute element-wise averaging over to embedding |
|---|
| 0:08:51 | you other combination method he's feature metalevel combination |
|---|
| 0:08:56 | the feature and it corresponds to |
|---|
| 0:08:58 | you operable c n |
|---|
| 0:09:01 | if we're competing in billings |
|---|
| 0:09:04 | we compute element-wise |
|---|
| 0:09:06 | maximum or two feature ms |
|---|
| 0:09:10 | and then compute emitting from the combined to feature |
|---|
| 0:09:16 | next |
|---|
| 0:09:17 | our describe the experiments and it results |
|---|
| 0:09:23 | we used a usb throughput twenty nineteen |
|---|
| 0:09:26 | what school |
|---|
| 0:09:27 | and physical access scenario data bases |
|---|
| 0:09:32 | it is widely used |
|---|
| 0:09:35 | it conveys in the field of the swooping detection |
|---|
| 0:09:40 | what's called access |
|---|
| 0:09:43 | quarters the detection of speech synthesis and voice conversion |
|---|
| 0:09:47 | it's got access |
|---|
| 0:09:49 | cars the detection we play okay |
|---|
| 0:09:55 | we used acoustic feature |
|---|
| 0:09:58 | all |
|---|
| 0:09:58 | two hundred fifty seven dimensional |
|---|
| 0:10:01 | group |
|---|
| 0:10:01 | you like |
|---|
| 0:10:04 | fast in for c n |
|---|
| 0:10:07 | for each utterance |
|---|
| 0:10:10 | we extract |
|---|
| 0:10:12 | two types of group delay k |
|---|
| 0:10:16 | one is from the original utterance |
|---|
| 0:10:19 | and the other is from |
|---|
| 0:10:21 | the time flip utterance |
|---|
| 0:10:25 | after the feature extraction we divided each |
|---|
| 0:10:29 | variable length feature |
|---|
| 0:10:31 | into fixed length |
|---|
| 0:10:32 | segments |
|---|
| 0:10:34 | to handle |
|---|
| 0:10:35 | a doublings all utterances |
|---|
| 0:10:38 | in our experiments we set the segment |
|---|
| 0:10:41 | thanks to four hundred frames |
|---|
| 0:10:47 | we use to the evaluation metrics |
|---|
| 0:10:51 | one is |
|---|
| 0:10:53 | eer |
|---|
| 0:10:54 | and the arteries |
|---|
| 0:10:55 | he dcf |
|---|
| 0:10:59 | used paper shows the or policies |
|---|
| 0:11:02 | on the ldc value |
|---|
| 0:11:04 | we highlight the s performance important |
|---|
| 0:11:08 | we mean that so |
|---|
| 0:11:10 | sure that performance on evaluation trials |
|---|
| 0:11:16 | and the f next method |
|---|
| 0:11:18 | sure |
|---|
| 0:11:19 | the best performance on development trials |
|---|
| 0:11:23 | you are don't mess source |
|---|
| 0:11:25 | generally showed offers or promises then |
|---|
| 0:11:28 | baseline |
|---|
| 0:11:33 | is table shows |
|---|
| 0:11:34 | well |
|---|
| 0:11:36 | or policies |
|---|
| 0:11:38 | one the p eight trials |
|---|
| 0:11:41 | the proposed method was sure the error or policies and the baseline |
|---|
| 0:11:47 | except the eer or |
|---|
| 0:11:50 | the two channel missiles one people not tried |
|---|
| 0:11:55 | we mismatch sources |
|---|
| 0:11:57 | sure the best performance on both development and evaluation types |
|---|
| 0:12:05 | in the beginning |
|---|
| 0:12:06 | we mention it |
|---|
| 0:12:09 | magnitude spectrum and a spectrum contain different information |
|---|
| 0:12:14 | so |
|---|
| 0:12:16 | we also be rude |
|---|
| 0:12:17 | the baseline systems that |
|---|
| 0:12:19 | use |
|---|
| 0:12:20 | manage to spectrum based feature |
|---|
| 0:12:23 | in our research |
|---|
| 0:12:26 | we used real power spectrum |
|---|
| 0:12:29 | s the many to spectrum based feature |
|---|
| 0:12:33 | ease baseline systems |
|---|
| 0:12:35 | our fourth fusion be a systems |
|---|
| 0:12:38 | that use |
|---|
| 0:12:39 | phase spectrum based feature |
|---|
| 0:12:43 | i fusion |
|---|
| 0:12:45 | we can utilize information go |
|---|
| 0:12:48 | well as many to and phase spectrum |
|---|
| 0:12:52 | really score level fusion |
|---|
| 0:12:58 | use table shows |
|---|
| 0:12:59 | a performance is |
|---|
| 0:13:01 | all the baseline system that |
|---|
| 0:13:04 | use |
|---|
| 0:13:04 | many to spectrum space it sure as input |
|---|
| 0:13:11 | establish rules |
|---|
| 0:13:12 | or policies of the fused system |
|---|
| 0:13:16 | on the at any scenarios |
|---|
| 0:13:20 | all the systems |
|---|
| 0:13:22 | art showed error or policies that you for fusion |
|---|
| 0:13:27 | the same trend can be shown |
|---|
| 0:13:30 | in the results |
|---|
| 0:13:32 | all though fused system |
|---|
| 0:13:34 | when the pac now you |
|---|
| 0:13:39 | finally conclusions |
|---|
| 0:13:43 | but conventional method |
|---|
| 0:13:44 | you see still phase spectrum |
|---|
| 0:13:47 | problem only something along only |
|---|
| 0:13:50 | in contrast the proposed method |
|---|
| 0:13:53 | you see still based spectrum |
|---|
| 0:13:55 | from the only small and the high flick signals together |
|---|
| 0:14:02 | it has effect on reducing the impact that various |
|---|
| 0:14:08 | and |
|---|
| 0:14:09 | shows what was performance |
|---|
| 0:14:13 | additionally |
|---|
| 0:14:15 | we can achieve |
|---|
| 0:14:16 | more better or policies |
|---|
| 0:14:17 | i fusion with those systems that use |
|---|
| 0:14:21 | many to the spectrum based |
|---|
| 0:14:23 | feature |
|---|
| 0:14:26 | and compare watching my presentation |
|---|
| 0:14:29 | with by |
|---|