Speech Transcript - FUNCTION OF PHASE-DISTORTION FOR GLOTTAL MODEL ESTIMATION

0:00:13	i will
0:00:13	show the also has a three minute warning
0:00:16	okay uh i can just and the fifty minutes presentations not tick
0:00:20	oh it's about exist display
0:00:22	yeah yeah hi everybody have time to to be yeah um uh for the morning not to use a scroll
0:00:28	do but
0:00:29	thank you
0:00:29	so
0:00:30	speak to yeah function that says
0:00:32	just as shown
0:00:32	so a the mother used mission
0:00:36	um
0:00:37	will
0:00:37	yeah yeah that's use a level of used
0:00:40	uh it was not some a question that it was a a a a a a the question of a
0:00:44	review of a of uh this paper
0:00:47	and i even if may looks like a bunch question
0:00:52	right
0:00:52	question
0:00:53	i
0:00:54	is is the worst
0:00:56	uh
0:00:57	use one H
0:01:00	or
0:01:01	so i we just take the first minutes you know to
0:01:04	to give you some context and try to send are uh answer this question
0:01:10	so the voice production is mainly made of the three components of the glottal source
0:01:15	made uh generated of the larynx level of the uh the vocal tract filter are on finite the radiation
0:01:20	of the most level
0:01:22	and then
0:01:23	leading of the vocal a use
0:01:26	a propagated with form
0:01:28	i i can be perceived from someone i
0:01:30	my feel of research is the voice transmission
0:01:34	thus
0:01:34	the question is or to transform a probably with form
0:01:38	you know that to make it perceived differently in of
0:01:41	voice qualities in terms of timber a in terms of
0:01:44	uh any other uh perceived elements of the voice
0:01:49	and the one uh uh i D
0:01:50	is
0:01:51	you
0:01:52	split of the propagated located where for
0:01:54	in order to retrieve a
0:01:56	uh the elements of the fun and of an elements of the voice
0:01:59	uh and in any other the all sort of a bit up to some the addition
0:02:04	but
0:02:05	i we use it in order to do this
0:02:07	uh yeah uh it is necessary to in but as the voice production which is
0:02:11	what a tricky problem
0:02:14	and one solution
0:02:15	if we need to invert uh uh
0:02:18	a search problem
0:02:19	uh is to add some constraints and
0:02:22	uh for of this and the and models
0:02:24	are very useful
0:02:26	because the that described an i to go a waveform on add some
0:02:30	constrained for these and
0:02:32	so the question is now
0:02:34	how to estimate the shape by does
0:02:36	or for both of model
0:02:40	so first i will briefly describe the model
0:02:42	or the rows production are used
0:02:45	as i say that there is a but also has of a trip fit on addition duration the but source
0:02:49	is made of
0:02:50	and a shape
0:02:52	in the time domain
0:02:53	this should as
0:02:54	obviously a time position
0:02:57	and then we assume that is shape is periodic
0:02:59	then uh we assume that the vocal tract is minimum phase
0:03:03	or the zeros of these that from
0:03:05	it's is is that
0:03:06	transform uh lies
0:03:08	in side the unit circle
0:03:10	and a fun i you we assume that the relation can be simply
0:03:14	my of the using a a a time a simple time to do the derivative of
0:03:18	which is in frequency domain uh G W
0:03:22	i have had the find and then we use a a a each uh representation
0:03:27	uh it may that's
0:03:29	as
0:03:29	also the and
0:03:31	um
0:03:33	a monies
0:03:34	is simply uh defined as
0:03:36	H
0:03:37	so you will find the now a a a a a phase component
0:03:41	as as a complex six L also
0:03:44	H
0:03:44	C and uh a uh it's or addition is simply we use
0:03:49	to
0:03:49	uh G H
0:03:53	so the proposed was in this paper
0:03:56	is the function of phase distortion what is that
0:04:00	uh yeah I Ds
0:04:01	a
0:04:02	first
0:04:03	it describe
0:04:04	you see you know a D question on describing the
0:04:07	do the function of phase distortion
0:04:09	and uh of the first I used to remove
0:04:12	and a medium phase components or from the main argument
0:04:15	X
0:04:16	H
0:04:17	using a a a a a a a it's uh um me "'em" phrase weight values editions through the real
0:04:22	cepstrum
0:04:23	then
0:04:25	right
0:04:25	this
0:04:26	um
0:04:29	this way of for yeah removing a new face uh real ready
0:04:32	helps
0:04:33	for example uh uh to remove the contribution in the face
0:04:37	uh of the vocal tract fit uh which could be present in the argument X H
0:04:42	then we use the seven or the difference about brought or in order to remove
0:04:46	and we now phase component remaining in a uh this
0:04:50	and division
0:04:51	and finally we use a and the two difference a parameter
0:04:55	you know to obtain a a a similar addition to the group do
0:04:59	we each is a a a a a meaningful in terms of
0:05:01	phase distortion
0:05:05	so that to the uh
0:05:07	example
0:05:08	or of those for a function of discussion maybe you at about uh thirty G from but that model
0:05:14	which uh described
0:05:16	uh the shape
0:05:17	of of the a a a a a lot of buttons
0:05:20	and we can use uh simple powerpoint use of this but the model
0:05:24	a using the transformed the lead you can from a model
0:05:27	and this problem to the of the which is used is a a a all the
0:05:32	it
0:05:32	part of the is more a uh of the pose
0:05:35	is uh looks more like any buttons
0:05:37	and he this bomb to is beak it more looks like a sign is i
0:05:42	a but the right
0:05:44	a about on of uh
0:05:46	was slide you would see a um a plot
0:05:49	which describe a function of phase distortion for are on of the first three harmonics
0:05:55	uh with respect to uh its shape parameter uh
0:05:59	so
0:06:00	this was function or a a uh in that i don't
0:06:02	or
0:06:03	a many uh
0:06:05	elements and there are only related to the shape
0:06:08	the i the and on the time position times to this end and all phase difference
0:06:13	the and the but i'm to the amplitude things
0:06:15	as an an addition by you mean "'em" phase very efficient
0:06:18	i and the on to the duration of the buttons
0:06:21	seems
0:06:22	uh uh we uh walk
0:06:24	on a uh a and money model
0:06:26	and find the are independent
0:06:28	or the new face call component
0:06:31	so those function and the you related to as the uh a shape of the buttons
0:06:40	and application is i those function of a distortion is the estimation of a of V R D problem at
0:06:46	also
0:06:46	a jack can from model
0:06:49	and in a lot to do this
0:06:51	i i will be describe a uh they're could an which is used
0:06:55	uh for of these
0:06:56	uh estimation which is
0:06:58	the great in a phase musician
0:07:00	which
0:07:01	as been a really use
0:07:02	a a you know that to the done line uh instant of
0:07:05	see if you can X station
0:07:08	or first we use the convolutive residual a which is is the division spectral domain
0:07:12	or or was the observed spectrum by model
0:07:17	then a uh we can say that if the model correspond to zap some signal
0:07:22	it can that the comet
0:07:23	quality what do that a is equal to one
0:07:26	but that's i can see is which you mean that
0:07:28	it's amplitude spectrum is equal to one a and needs space spectrum
0:07:32	is equal to zero
0:07:34	so the idea of great and phase musician
0:07:37	is
0:07:37	first
0:07:38	to ensure that
0:07:39	the amplitude spectrum of the convolutive residual is constant
0:07:43	and then a the simple idea is to minimize
0:07:45	a face but some
0:07:47	you know the
0:07:48	to feet
0:07:49	the parameter or to the observe scene
0:07:54	so we can see that
0:07:55	uh a given a of the voice production model as described before
0:07:59	no uh the linear phase but the shape of the glottal pulse the vocal tract it under addition
0:08:06	and a
0:08:07	the function of
0:08:08	phase discussion we propose
0:08:10	we can see that as functions can be applied to a a a a the convert you raise your are
0:08:14	we know
0:08:15	scrap the
0:08:16	the question but
0:08:17	you can just see of the file and uh for the last line
0:08:20	but that you can see a representation of the function of its station
0:08:25	by means of C
0:08:26	yeah
0:08:27	division by the you and phase realisation of the numerator or
0:08:31	with a an additional in a phrase and this
0:08:34	no phase is we be removed
0:08:36	thanks to this this and uh all the phase different
0:08:40	such as a we can um minimize and the phase spectrum
0:08:44	or of the convolutive residual
0:08:47	by minimizing is this
0:08:48	um L function which is done and the speed is square for mean squared phase using in so one the
0:08:54	phase difference
0:08:55	which is simply
0:08:56	the square of
0:08:58	uh a function of phase distortion
0:09:00	of
0:09:01	the types of seeing our by uh the lf model
0:09:07	yeah is some example of and those uh all functions
0:09:11	we uh three synthetic signal i'm as you can see
0:09:14	uh those functions are uh
0:09:17	quite
0:09:17	uh uh simple uh uh to optimize in terms of
0:09:21	finding being the row
0:09:23	the global uh a minimum
0:09:25	and describe a minimum uh seems to
0:09:28	to close spend while to two of these syntactic video which is used
0:09:32	in this example
0:09:34	there's we use
0:09:35	a simple a um brands method you know that to retrieve
0:09:38	the block but the global minimum of those functions
0:09:43	in terms of evolution of these uh make that's
0:09:46	first
0:09:47	to a simple example the
0:09:49	well i'm about
0:09:50	represent a and requiring a problem of of rose
0:09:54	going from a right about right voice
0:09:56	to a tense voice
0:09:58	um in
0:09:59	a you can see and um
0:10:02	the yeah ad you and or which is a predicted from the Z to the graphic see no
0:10:08	and uh
0:10:09	a well you can see uh the a deep part two which is estimated through you you D square
0:10:14	based made
0:10:16	obviously there is a big yeah has even if at the shape of a use uh
0:10:21	estimates modes are quite similar
0:10:23	there is a big B be could
0:10:25	the link
0:10:26	the link between uh the work of for motion the make an mechanical of motions
0:10:31	and uh the battle per season not
0:10:33	is not to use so we can only observe so colour relation
0:10:38	and below low you can see in figure
0:10:39	or
0:10:40	a a a a a small segment of speech
0:10:42	with writing uh estimate of our department
0:10:50	um
0:10:51	more than a few examples
0:10:53	here is an evaluation
0:10:55	um made from uh data basis using a game addict or blood the graphic signal for
0:11:01	the open question ease of predicted from the uh are D shape part of the and
0:11:06	then
0:11:07	the idea of a open question is predicted from the
0:11:11	click or of got sick and
0:11:13	and the figure out just below show we use a standard deviation of these open cushion
0:11:18	you know for for different that the base
0:11:21	and a as you can see a the proposed was method
0:11:24	seems to have a on the
0:11:26	a state of the main state of the also made a
0:11:32	so yeah
0:11:33	just as a mean to our conclusion
0:11:36	we propose a function of phase distortion
0:11:38	which are really did and mean to the shape of the problem birds
0:11:42	and as just function can be used
0:11:44	so to estimate a a should parameters of local models
0:11:50	thank thank for the action
0:11:56	thank you
0:11:57	and to yes
0:11:58	a lot of time for question
0:12:09	i start the discussion with
0:12:11	you
0:12:12	in it used uh you model by decomposing uh the total
0:12:17	transfer function the product of the glottal model the vocal tract and radiation model and you mate the standard assumption
0:12:22	about the vocal tract and radiation
0:12:24	in terms of as of their face function being minimum phase so be just the tree a make a term
0:12:29	so
0:12:30	that's that
0:12:31	assumption faq
0:12:33	you of further analysis at all
0:12:35	because you set funds you independent of the minimum of component
0:12:39	so
0:12:41	uh what if there are to track not but not be minimum face in the end
0:12:45	yes obviously use these and make use of a track is fine not mean will phase it will uh a
0:12:49	lot in found to the the result of uh the
0:12:53	of the
0:12:54	the estimate of the of the shape parameters
0:12:57	and that the same at
0:12:59	a are some shot about the vocal tract filter a is quite
0:13:02	is not that
0:13:03	a a strong and it seemed that
0:13:05	uh uh sit assuming that is me more phase is quite
0:13:09	uh
0:13:13	a a a a a a good assumption
0:13:15	but uh uh i i had made that's for the addition of the mouse a level uh uh it will
0:13:20	been may be necessary to make some more about
0:13:23	improving the model of the of the models are addition
0:13:28	and put that explain the differences you see between us to mets send
0:13:32	the class uh cut the graphic estimate
0:13:35	yeah
0:13:37	yes
0:13:38	and i didn't because
0:13:40	um you have to hmmm and to remind that uh the egg that a click
0:13:44	sorry
0:13:45	electroglottographic signals are only linked to the motion of the vocal folds
0:13:50	well what we are trying to retrieve a is an estimate of
0:13:54	uh
0:13:55	an approximation of the glottal flow
0:13:57	we and and the acoustic be could between the two is uh for form of views
0:14:02	not only now and so we can not only observe a correlation
0:14:06	so that's why there is a so much difference
0:14:10	yes piece
0:14:11	because there
0:14:17	thanks
0:14:17	and that would be your evaluation you compare against the i i i i a i have method this
0:14:22	you don't actually you don't say boss
0:14:25	privatisation that's of was used to extract a Q
0:14:28	which i think might be chris go
0:14:30	and
0:14:31	can you can use a but last method used
0:14:33	after using D i i have that it's just this is a a a a a uh i i i
0:14:37	if is first to get it to estimate the plot of for on not to estimate part of more than
0:14:43	sure
0:14:43	so uh the I is for to estimate the of for and then to fate
0:14:48	a a a a a a a a model on some pulses
0:14:50	uh a on the estimated uh a lot of rule
0:14:53	using a a uh
0:14:55	gradient descent or any you know those uh made
0:14:58	we can use
0:15:02	just just as a from from my screen's the at that the method jeez
0:15:06	"'cause" it
0:15:07	it it
0:15:08	it the make the method use can to different methods use very different a key value
0:15:13	so have
0:15:14	yeah just
0:15:15	"'kay"
0:15:16	thank you
0:15:22	you have more questions
0:15:29	this is not the case let's say thank the speaker again

FUNCTION OF PHASE-DISTORTION FOR GLOTTAL MODEL ESTIMATION

Modeling and Analysis of Speech Production

Presented by: Gilles Degottex, Author(s): Gilles Degottex, Institut de Recherche et Coordination Acoustique/Musique / CNRS, France; Axel Röbel, Xavier Rodet, Institut de Recherche et Coordination Acoustique/Musique, France