0:00:13i mean system we don't and i don't we are guide for the next twenty
0:00:17minutes if you have questions please press the power button and whatever you won't
0:00:23meanwhile lists internet and actual three
0:00:38okay
0:00:39this one bound together with the shuffle file now so
0:00:43we work on effect of the waveform we may have on this point detection in
0:00:48this time it's for clean data or physical condition
0:00:52it is a continuation of the were deemed on the same
0:00:56challenge
0:00:58for the most common conditions
0:01:01we define the problem you the motivation why to use the waveform
0:01:06we show will show several examples
0:01:09this way for be and have
0:01:11and
0:01:13will describe you know musician process
0:01:16which changes in may have all their plane data
0:01:20and show how to fix
0:01:22on the
0:01:23i is moving recognition and other effects
0:01:29the examples we show the results of the evaluation and then the big or
0:01:37so
0:01:39we can
0:01:39five problem then the three
0:01:42one two
0:01:44classify speech segment rather means gene speech
0:01:48or one speech
0:01:50one generally small speech can be synthesized over a door may
0:01:55or any other way but this work will focus on the data
0:02:04the motivation for this work is due to the thing that a lot of more
0:02:08than on i spoofing in the frequency domain
0:02:12maybe features were applied like mfcc uses c and the c
0:02:19and more
0:02:21but not much down with time domain
0:02:25and we want to learn what happens
0:02:28with the time domain statistics of the wave form
0:02:32and see how
0:02:36we can find changes between the union speech and
0:02:42shall let's take an example
0:02:45of a speech segment
0:02:48and see what
0:02:50if we look at the waveform and able to model
0:02:54we see john speech segment
0:02:57and then
0:02:58we want to find the probability mass function
0:03:02of the art students
0:03:04this statement will of sample queries
0:03:07sixteen b
0:03:09we also a person
0:03:11so we have our sixteen uniform distribution to be between minus one and one
0:03:21we show here only those two
0:03:24in the
0:03:25no range between mind zero one three and zero one three
0:03:31it can be seen that the
0:03:33i do
0:03:34and system will do you
0:03:35very similar to the last distribution normal distribution
0:03:40and its well known in the literature at least the speech
0:03:46no
0:03:47let's the they the samples for the evaluation of is reasonable
0:03:54twenty nineteen physical condition
0:03:58we
0:04:00evaluated the be an f
0:04:02all the genes speech brought about
0:04:05and this was speech the raw
0:04:09below
0:04:10and we see that there is the
0:04:12big difference between them
0:04:14especially
0:04:17around zero
0:04:19so
0:04:21it can put on the
0:04:23maybe easy even
0:04:25by human only by looking at the b m f
0:04:30to distinguish between
0:04:32these two
0:04:34classes union and
0:04:36replay data
0:04:38so if you want to make a group of feeding
0:04:41of course not too if so using to distinguish between them
0:04:47and we would like to have a similar distributions for all class
0:04:52so this process we then
0:04:54is a generalization
0:04:59will style shows from continues random variable
0:05:02and then goal is for example of a temporal
0:05:07to show how we
0:05:09d is
0:05:11our one dies samples
0:05:13so soon we have
0:05:16source in the f
0:05:18and
0:05:19we want to make transformation that it will have
0:05:24the
0:05:26pdf of the destination
0:05:28maybe f
0:05:30so we have
0:05:31two probability distribution function
0:05:33all the sort of
0:05:35and all the destination
0:05:37in our case the stores it is well speech while the destination is the engine
0:05:43speech is we want to convert the
0:05:47spoof
0:05:48same and to have the same statistics as the gmm speech
0:05:54so first for every sample
0:05:58from the possible speech
0:06:01we wanna we will find v
0:06:05value of the
0:06:07c d f
0:06:09then we will go in the general speech and you have
0:06:14where am will be the same value
0:06:17all the c d f
0:06:19and the range
0:06:21vector you're on the
0:06:24several i will be
0:06:27so
0:06:28i have to zero
0:06:30for this one speech will have no new value of better zero
0:06:37s in simple
0:06:38and these procedure we can do sample by sample for all the samples in this
0:06:44world speech
0:06:47of course in our case the distributions are no you know but
0:06:52discrete
0:06:54and the algorithm the legion be more again
0:06:59in discrete case
0:07:01the line is not movement email but have this continues
0:07:06and
0:07:07it looks like steps
0:07:09so for each time a from the small speech
0:07:13we see why use the
0:07:16a c m f relative mass function
0:07:20and now we will move and engine each have
0:07:24and it's not exactly this that's the values and the same place
0:07:29so we decided to take the lower bound
0:07:32in this case
0:07:34instead of this statement for four we have
0:07:39still you equal for the new value but it's not true for every
0:07:46so that it can change from sample stuff
0:07:50and of course we do it
0:07:52for all the samples here of the exact boundaries
0:07:56three increase in our case yes sixteen weeks
0:08:02so for my own
0:08:04the logical conditions
0:08:07and we see the results
0:08:10the graph about
0:08:11is the graph of the
0:08:13suppose speech
0:08:15while in the middle it's a graph of this of speech
0:08:19a little aging decision process
0:08:22and below use the
0:08:24be a ubm have all the original speech
0:08:27we can see that the algorithm works well
0:08:29and the
0:08:31generalize speech read
0:08:33is similar to gmm speech
0:08:37however when we try to apply the same algorithm
0:08:41for physical conditions
0:08:44we have a phenomena
0:08:46that
0:08:49in the engineering guys speech in the middle
0:08:52we have like in a bunch around zero
0:08:58jehovah sees the y-axis of the ml
0:09:02for speech
0:09:03the maximum zero one while other grass the maximum zero one four ensures
0:09:10vol in to make it better visible but we see that
0:09:16then generalize speech is far away for jane speech
0:09:22this phenomena was french and we wanted to
0:09:26understand what happened
0:09:29so we can see and in the these video
0:09:34around zero this speech
0:09:37we have a very big
0:09:39john responding
0:09:41which are several
0:09:44levels
0:09:46of a window of
0:09:48the may have been gmm speech
0:09:52so in when we
0:09:54convert
0:09:56this both speech would you know speech writing iteration process
0:10:02all three levels in this example
0:10:06of four and five
0:10:08are you and get an o b
0:10:11in the engine you guys five
0:10:14so to overcome these
0:10:17problem
0:10:18we can certainly db or duration of each
0:10:22so i performance of speech
0:10:24we had it is for small noise
0:10:28and such way
0:10:30we have more steps
0:10:32more available from invisible speech in these investment
0:10:36we had indeed
0:10:37three beats
0:10:39of uniform loans
0:10:42so we have
0:10:45eight times more
0:10:48dis-continuous level
0:10:49and that josh a lot more in this way now we can reach
0:10:55and level
0:10:56in the gmm speech
0:10:59in our case
0:11:00in real experiment
0:11:03to sixteen be additional noise of five b
0:11:07it means
0:11:09each level
0:11:11now have sort into
0:11:13levels of floors of
0:11:16when we apply these algorithm
0:11:18we can see the results
0:11:21the p m f or generalize speech is very similar religion speech
0:11:27so we or are the problem of the four previously
0:11:32of course we tried we also be the logical conditions
0:11:37and the results were who is pretty with
0:11:41so it doesn't diminish the previous results of logical conditions
0:11:47but i improved dramatically the results
0:11:50all of the generalization process with physical condition
0:11:56now we want to see what happens with and spoofing system
0:12:03well we use the generalization process
0:12:08so
0:12:09we to the baseline system that will provide by the organisers
0:12:14in one
0:12:16two classes for gmm speech and four
0:12:20speech in each class is a gmm with five hundred twelve gaussian mixtures
0:12:28there are two models well i four think uses in features and graph for eliciting
0:12:34features
0:12:35the baseline results are shown
0:12:38it didn't column of the baseline
0:12:42the next goal
0:12:43we used a miss the
0:12:47original gmm models but now try
0:12:50tools
0:12:52the one of the that a generalization
0:12:56so righteously the results
0:12:58all the models problem
0:13:01in the next step
0:13:02this data okay we will stay with real data before generalization
0:13:08by the gmm and
0:13:10of this model we are currently
0:13:14generalized
0:13:16data
0:13:17and we see that
0:13:18the generalization probability is very poor results
0:13:23are very big
0:13:25when we train
0:13:27and then we generalize speech
0:13:29the results are very on
0:13:35we can say okay
0:13:37we trained with one data and that the same data
0:13:42logical of the results are
0:13:46but i think a lot of
0:13:49and
0:13:50the control manager
0:13:52is to
0:13:53be able to recognize no admittance of a one thing because all the time you
0:14:00matters timing algorithms
0:14:03and
0:14:04if
0:14:05the system what well
0:14:07vulnerable to the
0:14:10new algorithms
0:14:14and it's not robust it's not little because we never and always will be the
0:14:19actual algorithm
0:14:23so
0:14:24to summarize
0:14:25well maybe
0:14:27we show that there is a big difference between the
0:14:32waveform distributions of the
0:14:37to really do you know speech
0:14:39and the
0:14:40speech
0:14:41a the doors
0:14:43a replay
0:14:45and effective way
0:14:48be easy to recognise in the time-domain the
0:14:54as both speech
0:14:56so
0:14:57firstly try present unionisation process how we can convert of the
0:15:05speech would be statistically more similar to human speech
0:15:11and we show love it
0:15:12it's better to a star
0:15:14noise
0:15:16to sample
0:15:18so means of noise and
0:15:20and better
0:15:22and unionisation
0:15:25then we tried this the control measure and we so that the results can vary
0:15:32dramatically
0:15:33with a friend use one data and try
0:15:37is that a or of spoofing
0:15:41in the form of understand the extendible
0:15:45for a moving system
0:15:48to behave like these
0:15:50because it
0:15:51must have very good generalization for be and
0:15:55neither one will the
0:15:57by national will have to be done
0:16:00this direction to
0:16:02may
0:16:04seized and much more we will i
0:16:14thank you very much and if you enjoy at all
0:16:17you can press play and listen to be again and again
0:16:23stay healthy by