Speech Transcript - Novel Variable Length Teager Energy Profiles for Replay Spoof Detection

but the way as in addition on a big the or at the moment variable

and together energy profiles for this assumption

please call per week the nist model or something

presentational by a discussion of speech data or regression

then twenty thousand eleven a short almost is ugly but also a unity it will

be really didn't you feel

and that in addition to

no two thousand three

in order to discuss different all come from each at least one but you

estimate of the system finally

so in this work we will be

the baseline system and the other site by d

but is if only a little training

and then vol challenge in

one seventeen leading system and i just

so we will be

formulated as stand alone and we consider a

and are the natural just you know speech

and then otherwise you will be system which is there

is a story or something what are you hungry if it is only genuine coming

as an actual speech

comparison

in this will be a excluding concentrating on the speech production

so that there is a little or lossless wealthy and single where they not be

honest you know

a little variations the

has basically three aspects lately ready one

environment according to one

so when we also for smart phones five or not using my

i quality that all speaker b

one and conical

the different

experiments i well for these recordings which is

well only is also one in which all

and this is a

we will discuss the modelling all a list of

it in the list movies model you one is a new one next even that's

a strange

during this whole signal s b is not an estimation of the impulse response

of the

the recording device

microphone

i mean what

plus the known model

you don't

so this imposes was a copy of this series convolutional we build upon so on

then the recording device and as a speakers characteristics

that is multimedia speakers and the and

so relating the blissful means also in signal characteristics differences of iterations models

there were involved only in the

jane speech

to his audibly distorted data so that sources speech you can see that

this is a

actually lower than the worst

this is an actual speech and e g

so then what went on according will only

especially

and then only isn't getting or

or a roller or

and that was one instead of just one here is the presentation

so one of the important characteristic that is here is that it should also gonna

some

in the differences between genuine and it

speech

and are the ones in a distribution because we can see there

most of the day

these changes are data

it is the distortion that is being nervous in the high frequency regions

because of illegal immigrants

and of just like expected because the their transmissions at their sticks on the acoustic

characteristics of gonna the phone

and women

is expected to be bandpass region because

only one can be is the error or because you becomes a

we'll have more stands opinion is

i don't have been system is responsible must therefore

okay was characteristics in the in the previously

in your dynamo in but it just a model of speech function

a we can see that maybe a speech has basically by first order statistics on

the right you can is on which involves only

these concealment speech coding speech

no in this work addressing the stars in addition there me

concentrate on a weekly

on a new data

additional industry on

nolan available in you

the idea is there you know a companion but we discuss the fundamentals of do

you wanna do not you so well before that

the initial requires the basis accent but instead of discontent signal x and then it's

previous that unveiling something minus one

an additional next congestion something that is

we find that it is your data in the next experiment and minus one in

so that you and then because of this on the amount of structured utilizes the

desire was

but their own meeting the

but

in boston immediate future in the presence

however within a is actually an actual speech signal catches the dependencies

in the signal is also has a signal and these different independent signal is not

a lie i can be your like having something minus one or the u v

mobile

well in the context of speech production and perception we know little or no control

recognition and i for this

no one she

so mostly for something or an introduction cognition perception

well whatever you want to thank you speech

so well motivated by this kind of

okay statistics of the natural speech we also exploit a pu you know if available

in one

while i mean

then we consider only the a initial clustering just and we consider

the i se but in the past and i in which

sure

and as this one and it is not really meant that is a mathematical

in addition

all the sinusoidal previous i and the previous section seven

and a we can see that use basically the new location as explained minus plus

one

excluding and less time based on and

but i score as defined in

one is in this because it captures of bananas yours was

based on the

reynolds

so this is and everything the

we begin this is the pu and this is that it wouldn't exist

consider

there is only just where you're being nor do i even it is

in this isn't the video games or two

it is because there isn't dependency structure because of the pu for these kinds of

and in this case

you the minutes in this is

i feel that it was a good why don't we discussed in

we can see that you also used the described next sure you domain and then

a justice of you in the netherlands

not in this but are we extend our recently proposed remote the actual and b

c basically women because not more

that is used in these easy we have an input speech

preemphasis problem and yet been investigated the and then be cleaning everything more than fifty

one from a nation

so miserably you'll be explained remedial the reason is there anyone you know

well actually better sticks all basically and dependencies and sequence of both genders speech and

then how did better than the

there is a question that only

a in this is a screen

so for example this is the

two can assume one all basically the speech

you know various acoustic and one that we discuss they can control spending analysis and

can see that the view point has got the ones which we discuss not be

a and b

their ability to just really the speech forty one

the final and one

no this is that he's for the initial clusters of similar to speech that we

"'cause" there's not as is that in the component other

was is trained using that as convolutional physically

impulse response will be these the resulting was it is obvious cases of this

all signals are inverse discrete cosine and sinus

that is themselves layers

we examine the impostors

the man digging out of the impostors and

and weddings an option

we can see that the pu provider maintains the high energy pulses and an additional

okay

the there's characteristics within that will use in which their children adaptation transforms used by

one morning or anything they're also that also for the natural speech in the u

a visiting then more only

so that it is which means you think that in cases in a considerable so

that it almost

with a single moment

earlier

which is basically in the next to the model mice is channel factors something more

than one indicating in the morning shows that are running and you changed

speech production that which should also direct relation

i in this study we are really

this

characteristics

this decoder just basically

speech

only the achievable when the anyone who have a variable are you gonna show a

well fine

for actually than that of speech shown basically well why a beautiful place

corresponding this is a to be

but in this feature a fight for

the weather the rest of your own et al

i just t

and the different for different values of the difference in the next layer for example

for this

and allows

and the elements in this is one is to show all three and one as

well you know that for different is the next

five

when using basically here with additional features and then there's each one woman and an

actual speech and hence we consider this s

secondly

better a discriminative you

for using the

and b a value in the pca projection

so these differences are also clearly better for the prior knowledge of multiple files are

used for the natural and you than the one of the financial speech and

but in order to utilize probability speech and that or distinctions this

not be quite and we plan and text i is differences

for doing that she

this is innovation

slightly distribution it's a function

for

okay well i mean i speech and the speech bin z there and the standard

batteries but this in figure you're to be figure in the world melodies

for spectral

i suspect and this is because

ten miliseconds each other ones for an s and h

we can see clearly there at the start with just here in both cases are

lower there are one get better result shows you are working together and it is

features really able to see what

focused

and high resolution of formant structure an overall distortion

one an active speech

and the signal doesn't features which are no

can be captured but only

well known as features in the residual so which is being unity

namely

during speech

and this is in profile the textual this profile always be there for the various

values of an index that is

well human speech

we used to using only the energy based vad point five we used and is

a ribbons in next

thus the phone recognition as well

and then be seen that are really

we see their four one and can see that for the various different as in

this distribution as producing features and testing a each altogether as it is there is

measured for different values of you

a one different messages consider

most of the five one solution to capture some features general capturing the traditional table

for that this is the t i one

in this thing using the standard statistically meaningful

is longer than surrounding wasn't two database

and it is that you one i in this work you the initial search

and in experiment is a little difference you the

these are not i logistic thing and assisting different no matter just like

for each of these features are going fourteen on the cross gender engine is varying

from one twenty thirty nine ninety

the motivation z-norm mixture component gmms okay and ninety five one

and we use basically different ones

in this work is in gmm simple gmms

this is a for successful results using a it is interesting

with that is for refinement is a dependence you next

you can see that basically

and anything that's

forty eight was the one they but it is my final and basically we consider

forty eight to five they are used for six point five significant and is a

twenty five percent

or represented as

which in the usual significant improvement in

and has fewer can be a to find an optimal choice of measurements index for

this experiment

and this is basically the

locatable score retire fungus you gmm and was you sure well based on all distributions

of the solutions

all sequences e

and this is an analysis e

and is mfcc and matrix

you can see that for you just distribution has to be well signal estimation whereas

for residual different for a gmm

this on the development

no you're not experiments for basically these features are like a combination of them but

you forty eight one

well and then you mfcc and the n six z

if it also there is used a list of the unlike this is from not

just like mfcc

i this is e

and six is significant performance improvement then

we both models going able to model well basically smooth and phone it is easy

features we can

so it is easy

then we'll is used in the ecstasy

and m c and we also there is really can strategy

this result is as you we just use this almost indicating that

but with features it also captures complementary information

then the baseline of the challenge can and

is systems on t

wasn't retirees wasn't one iteration

this is the and already you know

and then we also show the performance using a detection error tradeoff curve so we

can also they're the performance of the det calls for a way to one this

is one is basically

mfcc then security

this is one

and this is an existing data for me from clean the proposed features and screams

and

and similar training actually almost or with

also features are only one

however the fuses well formed elements are defined in and function indicating there

you models there is resigning

but are trained using the that the justice to perform better than the engine just

the decision features

i don't use and

here is an analysis or physically and one or more efficient well money well mauritius

physically model issues

and i saw in one additional from the perspective

so that reducing the problem first final one is okay

new the bar e

e here is for the natural and this a three different this from the different

characteristics like benefit

a high quality classes

three one and you'll be playing on the only problem

a message in which

so we can see their this is the sum of implementation and fast implementation and

they are very real distinct and is a weighting

involve a harmonic structure is or

in an actual speech but there is no result obviously

you need for

this is definitely a cost you difference between the natural and the

finally we evaluated the sickly

using this costings you can see that

the views different contributions like environment acoustic environment that voice recording ways

and we can see their own but for the proposed features to meet is even

da was to find the list equal and

existing we just like dimensions using consisting and

so this is showing me for an answer was you just on different conditions

find it was always in this work we take your batteries exploiting question

the idea was features to d c you know okay the menu

movies easy and everything but

and this is only on better for different decomposition of a controversial but was not

affected by the owners of different one

number of channels but she was adamant use this for most beneficial for the two

streams

this forms as a

well

on the final experimentation

i don't know we need was actually impulse response of random should be my acoustic

environment

we should definitely a landing on the nist is immensely challenging

things as well

with this knowledge yet results using line data and in as you gonna condition a

one time someone colours audio research

we also kind of the organisers of recognition workshop twenty and challenges of this is

what we also want to challenge

really and also

indeed it was made available but not from in this experiment be

sarcastically meaningful system not

and finally the citizens just

and we i

on the phone and h

Novel Variable Length Teager Energy Profiles for Replay Spoof Detection

Spoofing and Countermeasure 1

Madhu Kamble, Hemant Patil