DIMITRI KANEVSKY: Hello, I'm checking transcription.
We speak about defining the controlling parameter
inConstrained Discriminative Linear Transform
for supervised speaker adaptation.
As you can see, my speech is transcribed.
If you have any questions, please raise your hands.
You can speak into microphone
and your question will be transcribed also.
Let explain our problem statement.
Speaker adaptation was shown
to improve speech recognition accuracy,
when limited speaker data is available.
Transform base adaptation like maximum linear likelihood
transform was shown to be very effective
for speech recognition.
It was observed that discriminative adaptation can
outperform likelihood method, but
discriminative adaptation is not stable.
It is more sensitive to errors in hypothesis.
And also sensitive to some control parameters.
In our paper, we investigated how it can define
control parameters
for CDLT and we propose
one solution.
Log-linear dependence
will define
controlling parameters.
Let us recap
Constrained Discriminative Linear Transform.
It is similar to constrained maximum linear
likelihood regression.
It modifies means and variances,
with speaker-specific matrix.
It uses extended Baum-Welch transform optimization,
optimize objective function,
to optimize auxiliary function.
This is statistics
that I used to construct holes in transform.
It uses this controlled
parameter D.
In theory, this control parameter D need
to be sufficiently large in order
that optimization function was guaranteed to growth
and this parameter needed to be also small,
to guarantee fast learning.
We define D proportional to E in the formula.
This is our controlling parameter
that we study in our paper.
E is a long ratio, on
one hand like D it needed to be sufficiently large
to guarantee growth of objective function.
On the other hand, it need to be small
to guarantee fast learning ratio.
E also depends on amount of adaptation data.
If amount of adaptation
data is large, then E need to be small.
So original parameter have less
impact and opposedly
if adaptation data amount is small, E need to be large,
to have big impact of original parameter.
This picture shows what effect of controlling parameter have
on recognition accuracy.
We have adaptation data from 30 seconds to four minutes,
we analyzed adaptation data.
And E different values from .2 to 2.0.
As you can see from picture,
if amount of speaker adaptation data was more than two minutes,
we had for E .02 accuracy
but when there was less than two minutes,
accuracy degrade for small value of E.
We choose as baseline for future experiment E equal .5,
as you can see from picture,
this E is good balance.
It is sufficiently small to have
faster learning, and it gives word accuracy
across different amount of data.
In other figure, we studied how best E depends
on amount of data.
This line shows best point E for each amount of data.
So you can see it has almost log-linear dependence.
And in fact, we checked that correlation coefficient,
amount of data and amoun of E is close to minus 1.
This correspond log-linear dependence
of data on parameter E.
We estimate log-linear and data is here and
run on this dotted line.
It shows how it depends on this.
So you can see that this dotted line is very close
to manually tuning E.
We performed recognition experiment
to see how log-linear dependence affect recognition accuracy.
We have two test data.
One test data consists of 26 speakers.
It was use, derive log-linear formula.
Another was test data that consisted of 21 speakers.
Both data with speech in two parts, one for minute
for speaker adaptation and another
for test data for each speaker.
And we run constrained MLLR to get proof for transform.
For simplicity we choose one Rover transform.
Here our table shows recognition accuracy.
We have three strategy for E. One, it is fixed E as .5.
Another, E was tuned manually and third, E was predicted
from log-linear formula.
And we have accuracy for several amount of data.
As you can se for both tests,
E was as manually tuned or estimated from log-linear form,
outperformed recognition that was fixed E.
For example, for four minutes, we have the manual tuned WER,
4 percent better than fixed E
and for projected E it represent relatively better.
Similar for test data, predicted E was 2 percent better
than fixed E relatively.
Since we estimated log-linear in test one
and use it for test two,
we think that it can be generalized
for other speakers too.
So we saw that in our experiment result,
log-linear dependence help improve accuracy.
We investigated question, can this
log-linear dependence in other situation.
We checked ridge regression.
We consider this regression model.
We use this quadratic equation to find best value for lambda,
for each amoun of data N, that have best predictive value.
We found that when N grows, lambda is inversely proportional
to N. This gives us log-linear formula.
This is proved in our paper.
Here are our conclusion.
An empirical study that investigated impact of the EBW
and parameter was performed in our work.
We found log-linear relationships exist
between the optimal setting on the controlling parameter E,
the amount of adaptation data.
You found that with E, set basically,
log-linear relationship performance was better
than CDLT baseline.
We proved the log-linear relationship does exist
for each regression.
We can expect that the log-linear relationship holds
more generally,
in multiple settings,
since regularized linear regression is the backbone
of many learning problems.
I'm finished.
Okay.
Do you want to take your questions now?
DIMITRI KANEVSKY: Do you have question?
You need to speak here.
I can continue with your question.
Anyone?
DIMITRI KANEVSKY: What is your question?
I think I saw that you are using supervised adaptation.
Does that, is there any real task that that corresponds to,
and is it really important to do supervised adaptation?
DIMITRI KANEVSKY: Yes.
For example, for telephone application,
when you have some services for
user and somebody logs, some speaker is log
to telephone service, we have quick adapted names
to make it better.
The data requires little amount of adaptation data,
when you run this experiment.
I have another question.
It seems to me that it looks like you are doing this
in a sort of batch mode, and the flavor
of the log-linear relationship suggests
that you might get some improvement out of a sort
of on-line adaptation technique.
Have you considered that at all?
If you are finding this log-linear relationship,
would that hold in an on-line?
DIMITRI KANEVSKY: A good suggestion.
We can check this, on-line adaptation.
Okay. Thank you.
Let's thank our speakers.
(Applause).
DIMITRI KANEVSKY: Thank you for service
who helped organize this transcription.