and

just that the stance was you met and the statistics in as creation ask data

and the subject of my talk is to introduce an improvement method for text independent

phonetic segmentation based on that might kinda ne call mark came from

in brief

i will first focus on on what you a to be you a speech has a complex signal

physical sense

physical sense that is to say to you read

as a realisation of complex that

but after to having

if we introduce periods that time seen the study of complex system might be to use a powerful two

a cache in your character of the speech signal

this is called micro kind a knee call mark K for money's M M

and i i show the general potential of speak M M F have to be applied and a speech and

all is

and then i with channel on on hunter

application of these formalism them to phonetic segmentation of a speech signal and i been introduce

a basic and improvement to for segmentation

and finally i would take some time to present experimental results and to conclude

so it has been

to a quality and experimentally established that there use

for once a nonlinear phenomena in the production process of the speech

signal for example already was number which is a

number characterising different for a used

i put to be able to as thousand

which corresponds to to a for

a well as we know most of the

a in the speech processing tsar

based on the linear source-filter model which can not a quickly take into a

but a in your character of the speech signal

hence and so but here is to find then value a key parameters which are responsible for the complex

cut of a speech signal

previous studies have me should have shown that such parameters do exist but they are very hard to be estimate

our strategies to take the

and knowledge is coming from a statistical physics and to relate the complexity with the predictability of each point inside

the signal

and in practice need to

there although computationally efficient tools to

yeah

to make these parameters if there exist and to use them for a practical and a

as important one

as in the study of complex system the first phase of started in the late forties with the classical walk

of colour more of

and

which was the basis for the latest at later post in this domain

which are based on the study of a structure functions state

a main result of these methods used to

recognise a global lead the existence of a multiscale that structure without giving access to

state there

i mean

oh is a use is two

side

because they are based on their statistical average is non the stationary assumption

that can be used to decide whether a system is complex or or not that much more information

and the second phase missed we try to

uh

that's a mind you much recording inside the signal where the complexity happens and how it to its a

a more

precise terms we try to find a subset inside the signal which have the highest

information content and we try to explain how these

the transfer of

information between different the scale

organises itself

as methods are being made possible by the approach in the statistical physics in this study of

i lily system and the two size

a study of the notion of

transition site a complex east

as shown that uh so as you metric multi a scalar quantization is responsible for the complex C this inside

a signal

a typical example for the is the cascade of energy in fully developed look problem

fingerprint impact is is the existence of a power law behavior in the temporal correlation function

which has to be you

you value that out of any of stationarity assumption at each point site the signal any

a single exponents related to this power a lot of as we will be see you see shortly

a score of singularity exponents that it can be shown that it completely explains the

a quantization of multi-scale the structures

and

an example in this

i stick can only "'cause" form as mean that is in this study of multi of signals

i the kind a equal for models which was the first that am trying to at them

singularity exponents as a global property of the signal with

to what is called a lower down to spectrum are in this equation we have

a complex signal as

and a multi resolution a multiresolution function grand mal what thing at this scale or

and he the at to stand for expectations of where

a statistical ensemble

the exponent of these power to P could be related to the a

a distribution of singularity exponents

two dollars on transform but main problem is that it's a global description it doesn't give access to

equal

and a local dynamics of the signal

so in but

a can only from one is be try to

instead of of feeling on the statistical able to be try to see

so the signal

i i try to introduce

singularity exponents you much

is related to geometric location like the signal be a

the time index T here and uh

yeah

multiresolution function gram are

and this can just to here the power the

exponent and of this problem this but because single singularity exponent

and

can be estimated

precisely to

a we of the transition phones of the signal

yeah

to main problem is that precise estimation of these parameters

and uh in this regard but a what of one of the crucial sure choices it

problems is the choice of the functional grammar or for example we can use

simply the linear increments

and that it has been shown that it it doesn't give a precise estimation of H of T because of

to

a stable and sensitivity of these

and you in cream

have a best choice for batman

it's trying to be the grab model speech is defined as the integral of the variance models were work the

but i

oh use a B R teen this equation and normalized but the robust me on the real i

that's is defined from be typical characterisation of

can take energy into a real and

it has been shown that it

can

it is related to the information content of each point if we to use these measure four

yeah

calculation of H of T

so make this or if we can have a good estimate of H of T

i can um work

a a very important subset inside the signal which is called most thing we have many for this corresponds to

the

and since i the signal which up have to your of singularity exponents

it has been shown that the

or lower the value of a single exponent is the high

these are on the given point

so the critical transitions of the signal use have is happening

at this points

and a of a reconstruction from has been proposed that

and it has been shown in many applications that P can we construct the whole signal having access to only

this small subset of to date

so this is what just to too the importance of the singularity exponents

how have to that we can turn on to see how they can be applied to speech signal

previously we have shown that the estimation procedure of H of T for a speech signal and B have shown

that we can have

good to estimate of H of T for the majority of point in the speech signal we

have a speech signal extracted from timit

timit database with vertical red lines speech was the

phoneme boundaries them them from manual transcriptions provided in timit database and

of course the objective of text independent to phonetic segmentation is to identify these phoneme boundaries

and in a

tolerance mean do

so

since that is

different phonemes

they have we know that they have different a statistical properties V

expect a singularity exponents to have different behaviours

to show these you studied the

a can

distribution of the single a exponent the time evolution of the distribution of singularity exponents

so we have been those of to length thirty miliseconds be compute can

histogram of B

and we plot it's

a time evolution over time

and can easily not in this uh uh a graphical representation which is which are the P of conditional to

that histogram of singularity exponents conditioned on time

and can easily not a remarkable change in the distribution of singularity exponents between different phonemes

this has been extensively

evaluated over different to speech sect

signal

but the problem is that it cannot use these uh

graphical representation for but for developing a

but an automatic segmentation how

or you provide a E

is here to be used for an automatic algorithm

we we is that the easiest interpretation of these changing distribution is changing the average

a find a new measure of we it a C C V just simply get primitive of exponents

and

this could be considered as the can the average instantaneous average of singular to explore

we can see the resulting functional

and i it is clear that that it shows

a difference in distributions more clear a

so inside each phoneme the

a C see that is

or less in yeah we do not a change in

so a second of phoneme boundary

however

to develop an automatic fit

segmentation have or is that it can is very simple metric used to fit a piecewise linear curve to this

and C C by minimizing the mean square error

uh we have a

a a going wrong with take fitted okay

and we have identified the breaking points have like a candidate point

see that you have a a twenty five many

most of the

boundaries trees bit very good resolution because

a there are the

because we don't have any been doing

problem in this we have

access is high as possible resolution which is the sampling frequency of the speech signal

so

the primary simulations shows that is

but a simple metal

has comparable results with the state of the art these which was present in know previous works

and

oh at that it is that we don't a this it is not a

sensitive to the threshold

selection as we will see in experimental results

but where it's a per by performing a or on not is of this method be observed that

the i mean see in the

uh

that's

yeah i these thinking difference in the distribution of singularity exponents but the a C is not able to reveal

them to

identified the

i boundaries

a are points that there is no distinctive

changing the distributions but a C C and linear care feeding makes some mistakes

has a try to use a

but a classical approach in that

detection of change

change detection which is right to you has been widely used in segmentation of regions

which is a two step procedure to first

to select a set of candidate was generous

and then to a he is to to do the decision to

C but they're each can lead to to the corresponds to a change in the

can you know features or not

so for the process P selection is that we have two observations first we so that some of the missed

boundaries correspond to the

transitions between fricatives stops to roles

and uh

so can be so that that but

positions to detect are the transitions between

well i know it's segments or silence or poses two phonemes because

and silence we have

i would positive value of singularity exponents and you know active parts we have a

i only negative values

so it you an easy to

it take change in the

that's cups of a C C

hence we so to

uh i was a to be applied to a pass filter to the original signal and do exactly this same

to compute the singularity exponents and a C C for the low pass signal you as an example in the

that

the figure you can see that a C C of the original signal and in the right one you can

see the a C C of the lower filter

have to

signal we know that fricative is steep so and as far as are

essentially a high band signal than low pass signal corps

tends them into a a low energy

and to low energy signal

and see that the

figure we have some changing

shape or C C but it is not easy to detect which the

linear curve care feeding but in the right side right hand side yeah

much easier to detect a T reason is a another example of again i emphasise that we have to changing

the original a see C

but it is

not easy to detect

but that in the low pass version on the right hand side

it is really easy to take the

so as the first the you up apply the nmf A C R B C god

two

signal and its low pass filtered version

i'm the

but or or the breaking points as the as a candidates

and in the second

point to be to be perform uh

dynamic and i mean doing

followed by a log likelihood ratio you but as test to see

and one of the candidates but are they actually correspond to a changing distribution of singularity exponents or not

i in for size that be do is on the single exponents of the signal itself because we are interest

to to show the strength of singularity exponents the low pass filter of a filtered version

the does not have any real meaning is just some diversity via at are i grew

so that was the dynamic or window mean during procedure for each point

the consider treating those icsi like again that

oh have to question you put as is on

a question

and

i have to be but this is that to a single the exponents of that are generated by a single

gaussian or

it is generated by two questions on

X or we click

so much for H one what

right could then H C to a and we take the candidate as uh as the boundary otherwise we remove

it from a candidate please then

we go to the next

three

so

i experiment our simulations were done on timit the based on the full training for of to meet which consist

of four thousand and six hundred

sentences and we have developed a

i was move or to randomly chose and files from these data

we have

try to report of the possible performance in because there is this difficult in the literature to compare

have have reported out of time to simplify later corporations

are two category of

a score partial uh a or but you have hit rate or hit rate we shows the

right the

right of correctly detected by take that boundaries or segmentation we chose

how much more we have to take to than false long shows that

how much

i

how many false use have you have to take that

the problem with these partial as scores is that

a can be they can go in opposite directions for example an improvement each rate

could correspond to an increase in false alarm rates so we cannot do a

for on page and only be partial the schools but are about the score

to this partial the course i've missed and used go to a console

for example if one

takes a wrote and false alarm it to content or value takes hit rate and

or were segmentation into a beat

much in is on over segmentation rate so

oh the experimental result first we can see that comp

a C C D's do we seek a good on the improvement

and on the

for a different style utterances

we can see that we have like

two or three percent

huh improvement in france so one road and the like

for presenting in over segmentation and he rates are more or less the same

but and it this shows the

improvement over the procedure great

that compared

then be compared to that

a friends number so and which is the

state of the art in the literature

i can see that for the two runs of twenty five miliseconds be a were almost the same

contrary

yeah but a percent improvement in the file so long but and we have

ten percent improvement in our segmentation

uh right

a a more important for even if we go to

a low tolerance is for five miliseconds we can see that

for

i i love these we have like

more than ten percent improvement in heat rate false alarm and or segmentation this is because the

i would a high resolution of the to C C function of

that's the bit ones

but i been doing we don't have to been doing you have access to the finest possible resolution

in terms of a measure we can see

that's a a for a lower resolutions we have more than ten percent improvement in both of the

okay

for in both of the

um

a

scores and for twenty five miliseconds be have like six or or or or four present

improvement in or a and if so

have have uh to uh i i mentioned that the method is not sensitive to to show which is a

problem of the

as a call

so

text methods of phonetic segmentation

we are trying the

have shown the

a sensitivity of to a is to the care beating to sure

i have changed the could sure sure to over four hundred percent

the value of the threshold and they're

value you of a value only has changed in a zero point five percent this shows that

a choice of the threshold is not important that all in this have agreed

i choose a

for a independent is an important feature

of

we have

but these these to you have shown the you have emphasise on the strength of singularity exponents in section of

transitions found transitions fronts in the speech signal

a more importantly the promising phonetic segment

average be encouraging results in phonetic segmentation shows the

potential of M F in done it is is of week or local dynamics of a speech signal hence this

are are you of work is to use M M F U

i don't know means of a speech technology

and you to use the

constructions from or or or the concept of what to model they've that which is an ongoing research and

result

i hope to have good results in that

from

time to very much for that

right on time

i can take questions one and one but this is officially the end of the fact

oh

okay

yeah

i