DIMITRI KANEVSKY: Hello, I'm checking transcription.

We speak about defining the controlling parameter

inConstrained Discriminative Linear Transform

for supervised speaker adaptation.

As you can see, my speech is transcribed.

If you have any questions, please raise your hands.

You can speak into microphone

and your question will be transcribed also.

Let explain our problem statement.

Speaker adaptation was shown

to improve speech recognition accuracy,

when limited speaker data is available.

Transform base adaptation like maximum linear likelihood

transform was shown to be very effective

for speech recognition.

It was observed that discriminative adaptation can

outperform likelihood method, but

discriminative adaptation is not stable.

It is more sensitive to errors in hypothesis.

And also sensitive to some control parameters.

In our paper, we investigated how it can define

control parameters

for CDLT and we propose

one solution.

Log-linear dependence

will define

controlling parameters.

Let us recap

Constrained Discriminative Linear Transform.

It is similar to constrained maximum linear

likelihood regression.

It modifies means and variances,

with speaker-specific matrix.

It uses extended Baum-Welch transform optimization,

optimize objective function,

to optimize auxiliary function.

This is statistics

that I used to construct holes in transform.

It uses this controlled

parameter D.

In theory, this control parameter D need

to be sufficiently large in order

that optimization function was guaranteed to growth

and this parameter needed to be also small,

to guarantee fast learning.

We define D proportional to E in the formula.

This is our controlling parameter

that we study in our paper.

E is a long ratio, on

one hand like D it needed to be sufficiently large

to guarantee growth of objective function.

On the other hand, it need to be small

to guarantee fast learning ratio.

E also depends on amount of adaptation data.

If amount of adaptation

data is large, then E need to be small.

So original parameter have less

impact and opposedly

if adaptation data amount is small, E need to be large,

to have big impact of original parameter.

This picture shows what effect of controlling parameter have

on recognition accuracy.

We have adaptation data from 30 seconds to four minutes,

we analyzed adaptation data.

And E different values from .2 to 2.0.

As you can see from picture,

if amount of speaker adaptation data was more than two minutes,

we had for E .02 accuracy

but when there was less than two minutes,

accuracy degrade for small value of E.

We choose as baseline for future experiment E equal .5,

as you can see from picture,

this E is good balance.

It is sufficiently small to have

faster learning, and it gives word accuracy

across different amount of data.

In other figure, we studied how best E depends

on amount of data.

This line shows best point E for each amount of data.

So you can see it has almost log-linear dependence.

And in fact, we checked that correlation coefficient,

amount of data and amoun of E is close to minus 1.

This correspond log-linear dependence

of data on parameter E.

We estimate log-linear and data is here and

run on this dotted line.

It shows how it depends on this.

So you can see that this dotted line is very close

to manually tuning E.

We performed recognition experiment

to see how log-linear dependence affect recognition accuracy.

We have two test data.

One test data consists of 26 speakers.

It was use, derive log-linear formula.

Another was test data that consisted of 21 speakers.

Both data with speech in two parts, one for minute

for speaker adaptation and another

for test data for each speaker.

And we run constrained MLLR to get proof for transform.

For simplicity we choose one Rover transform.

Here our table shows recognition accuracy.

We have three strategy for E. One, it is fixed E as .5.

Another, E was tuned manually and third, E was predicted

from log-linear formula.

And we have accuracy for several amount of data.

As you can se for both tests,

E was as manually tuned or estimated from log-linear form,

outperformed recognition that was fixed E.

For example, for four minutes, we have the manual tuned WER,

4 percent better than fixed E

and for projected E it represent relatively better.

Similar for test data, predicted E was 2 percent better

than fixed E relatively.

Since we estimated log-linear in test one

and use it for test two,

we think that it can be generalized

for other speakers too.

So we saw that in our experiment result,

log-linear dependence help improve accuracy.

We investigated question, can this

log-linear dependence in other situation.

We checked ridge regression.

We consider this regression model.

We use this quadratic equation to find best value for lambda,

for each amoun of data N, that have best predictive value.

We found that when N grows, lambda is inversely proportional

to N. This gives us log-linear formula.

This is proved in our paper.

Here are our conclusion.

An empirical study that investigated impact of the EBW

and parameter was performed in our work.

We found log-linear relationships exist

between the optimal setting on the controlling parameter E,

the amount of adaptation data.

You found that with E, set basically,

log-linear relationship performance was better

than CDLT baseline.

We proved the log-linear relationship does exist

for each regression.

We can expect that the log-linear relationship holds

more generally,

in multiple settings,

since regularized linear regression is the backbone

of many learning problems.

I'm finished.

Okay.

Do you want to take your questions now?

DIMITRI KANEVSKY: Do you have question?

You need to speak here.

I can continue with your question.

Anyone?

DIMITRI KANEVSKY: What is your question?

I think I saw that you are using supervised adaptation.

Does that, is there any real task that that corresponds to,

and is it really important to do supervised adaptation?

DIMITRI KANEVSKY: Yes.

For example, for telephone application,

when you have some services for

user and somebody logs, some speaker is log

to telephone service, we have quick adapted names

to make it better.

The data requires little amount of adaptation data,

when you run this experiment.

I have another question.

It seems to me that it looks like you are doing this

in a sort of batch mode, and the flavor

of the log-linear relationship suggests

that you might get some improvement out of a sort

of on-line adaptation technique.

Have you considered that at all?

If you are finding this log-linear relationship,

would that hold in an on-line?

DIMITRI KANEVSKY: A good suggestion.

We can check this, on-line adaptation.

Okay. Thank you.

Let's thank our speakers.

(Applause).

DIMITRI KANEVSKY: Thank you for service

who helped organize this transcription.