Přepis řeči - Cosine Similarity Scoring without Score Normalization Techniques

0:00:06	so i think today will you be happy to say two different
0:00:09	point of view that mathematical point of view topic maybe that one generation
0:00:13	i'm not saying that what's all just
0:00:15	oh
0:00:16	and engineering solution which is maybe fast
0:00:18	and the more simple simplify life for us
0:00:21	and
0:00:24	so
0:00:25	the topic today is about how we can make cosine distance scoring because
0:00:29	you know buttocks
0:00:30	this covers morning that he what is P L D A T don't need any score normalisation
0:00:34	so we try to understand
0:00:36	what is corn images in the wagon cosine scoring and how we can
0:00:40	move the score normalisation from scum space to put a very but the spatial ivector space
0:00:46	so the presentation um is
0:00:48	uh organised as follows O four so we give an introduction on the contribution of this paper
0:00:53	i'll try to define
0:00:55	like a text i have to define what stop are very very space what is cosine distance
0:00:59	our channel compensation work
0:01:01	in this uh space
0:01:03	and
0:01:04	after that i will show you how how
0:01:06	tensed and what score normalisation likes it enormously normal as normal is doing in the
0:01:11	score it's gonna go uh design the cosine distance scoring
0:01:14	and how we show you how we develop new scoring that
0:01:18	not any score normalisation with it still there but we we just move it to the probability space
0:01:23	and leave some experiment and result
0:01:25	and
0:01:26	finally give a conclusion
0:01:29	so
0:01:30	i recently would allow
0:01:31	so new delhi motion speaker presentation
0:01:35	we should make a lot of a lot easier for us you know
0:01:38	it because now we just open the water pollution working can you can try i'll try P L D A
0:01:42	wanted to the egg
0:01:43	and it can be fatal
0:01:45	with with this in this new dimension of the space
0:01:47	so
0:01:48	and we also there well
0:01:50	i cosine scoring
0:01:51	they don't need any target ornaments
0:01:53	just to
0:01:54	X X I
0:01:55	i vector over time factors for target them
0:01:58	yes
0:01:58	and compute cosine distance and compile this threshold
0:02:01	it's very easy
0:02:03	is not complication there
0:02:04	that's so this make the decision
0:02:07	faster
0:02:08	simple
0:02:08	clacks complex
0:02:10	there's no scatter
0:02:12	so
0:02:13	but
0:02:13	yeah coming here we need spinach score normalisation
0:02:16	so the with the wheel is it a normal as normal because in the new version of the system i
0:02:20	use as normal so
0:02:22	we did we did see no need that so
0:02:24	so i try
0:02:25	in this paper
0:02:26	tensed and
0:02:27	one score normalisation is doing
0:02:29	and the cosine distance
0:02:30	and
0:02:31	how
0:02:32	i can't assimilate
0:02:34	this uh kind of scoring in the clean up but right
0:02:36	space without going
0:02:38	the score space
0:02:39	so
0:02:40	this is the thought that would talk on in this uh this part and
0:02:44	what we did that you just want to do some speaker adaptation using cosine distance but
0:02:49	i would not talk about in the paper you flip on some result
0:02:52	but stephen was next
0:02:53	presenter will talk about
0:02:55	so we if you have any question about speak answer why that nation you can talk to him not to
0:02:59	be
0:03:01	okay
0:03:01	so
0:03:03	so if you
0:03:04	now everyone down here now that C F A try to split
0:03:08	in the general supervectors
0:03:10	in two parts
0:03:11	uh one part is the speaker space
0:03:13	okay
0:03:14	and the second is a sorry
0:03:16	that's so
0:03:16	with the first part is because by the second part of channel space
0:03:19	so
0:03:21	two years ago when we was engine option interested what's it is to watch a thousand eight
0:03:25	we try to see
0:03:26	the efficiency of it and he'd of for every line variable
0:03:30	like speaker space common space and channel space
0:03:33	so we
0:03:33	take every component of this
0:03:35	jfa and we put that in
0:03:38	i don't was not much for support vector machine
0:03:40	and we use cosine distance to see the performance
0:03:43	so
0:03:44	what was surprising that eigenchannel of this site the channel factors when we put it to to scrub the machine
0:03:50	we are then to having black decorate
0:03:52	fifty percent because
0:03:53	normally channel factor don't contain speaker information
0:03:56	we find that
0:03:56	we have
0:03:57	and incorrect of twenty
0:03:59	so it means that
0:04:00	affirmation that we are losing
0:04:01	in this channel factors
0:04:03	so in order to restore maybe
0:04:06	maybe be a moron could say that but
0:04:08	to minimise the impact
0:04:09	of
0:04:10	this information that we are losing the speaker factor
0:04:13	the idea of the pen factors comes that
0:04:16	so
0:04:16	total factor was born in
0:04:18	but also a tangent hopkins university
0:04:21	so
0:04:22	and what we did is just
0:04:23	which
0:04:24	although that had been built
0:04:25	separate speaker space and channel space with one one
0:04:28	once again space
0:04:29	at what which model but
0:04:31	speaker and channel variability
0:04:32	i do recall that the real
0:04:35	and
0:04:35	so when we have a target
0:04:37	and
0:04:38	yeah that's
0:04:39	which is
0:04:40	got project
0:04:41	but together
0:04:42	in the shop and maybe space and we can just compute the cosine distance there
0:04:47	so
0:04:48	so the one that we use that are very pretty and what is different between speaker space like eigenvoices and
0:04:53	put on my ability
0:04:54	for the uh for in our case
0:04:56	so
0:04:57	for the egg and a voice for the jfa
0:04:59	if like for additional for speaker
0:05:02	for for speaker always recording
0:05:04	it is seen as
0:05:05	same speaker
0:05:07	so we put all the work on it together
0:05:08	for a bit of a space is the opposite so
0:05:11	four
0:05:12	for each recording of the same speaker
0:05:14	is seen as a different speaker
0:05:16	so we want try to model but speaker and channel variability
0:05:19	the only thing
0:05:20	so if you have the eigenvoice
0:05:22	algorithm is the same things you should
0:05:24	the same
0:05:24	use the same are good for but
0:05:26	just the list is different
0:05:28	okay
0:05:29	so for this
0:05:29	for the eigenspace we put the data from the same speaker in the same
0:05:32	five and four but maybe two speakers
0:05:35	it's five
0:05:36	is recording slightly stated
0:05:37	different speaker
0:05:39	so there's a different way to estimate
0:05:41	no i can eigenvoice okay or or maybe
0:05:45	so
0:05:46	the relevance map so we can use
0:05:47	for each recording estimate
0:05:49	general supervectors by map adaptation marilyn smart
0:05:53	and then compute pca
0:05:55	okay
0:05:55	and you know again with eigenvoice map map adaptation with a with a gmm supervectors not observable and we would
0:06:01	yeah my good too
0:06:02	we estimate
0:06:03	all this
0:06:03	for every day
0:06:04	so
0:06:05	why we are using that eigenvoice i think because like
0:06:09	in what was in G out in a p2p uh some happen
0:06:12	university for the workshop
0:06:14	some people from you to try
0:06:15	different kind if i'm not wrong
0:06:17	like my and everyone smart and again mathematician for
0:06:21	speaker true speaker factors training
0:06:23	and we find that the best is
0:06:25	eigenvoice maybe i'm wrong
0:06:27	you can confirm after
0:06:28	um
0:06:30	so and also
0:06:31	eigenvoice and is known to be more power for for short duration
0:06:35	so maybe it's explain why are very weak in this case given a better result than
0:06:39	irvine smart
0:06:42	so
0:06:44	what do we have targets
0:06:45	speech or a target recording and test recording so we estimate this
0:06:49	but i'm very beating up the factors
0:06:52	uh vectors and
0:06:54	which is to compute
0:06:55	a cosine distance scoring between the two
0:06:57	vectors
0:06:58	okay
0:06:58	so
0:06:59	um and then competitor shot so you don't have to do channel compensation so i just
0:07:04	uh first do lda to do some dimmers introduction and
0:07:08	to maximise the speaker and minimised and stuff
0:07:11	it wouldn't cost
0:07:12	within class
0:07:13	with the speaker variability sorry
0:07:15	and updated obvious to see and to do some kind of normalisation
0:07:19	in the
0:07:19	in the little be much less of a node initially rate no timit
0:07:22	space
0:07:23	okay
0:07:25	so
0:07:26	linear so
0:07:27	and the is
0:07:29	it just like uh i'm gonna metric is defined by solving this generalised eigenvalue so between
0:07:34	we use a bit with speaker
0:07:36	viability and within speaker variability
0:07:39	um i think
0:07:40	my sister
0:07:41	so here there's only one remark that they need to put up
0:07:44	uh in the first version of the
0:07:46	but the cosine distance i say that
0:07:48	the mean of all all this
0:07:50	speakers is equal to zero because they have normal
0:07:53	from the distribution
0:07:54	but for the top of factors
0:07:56	but in this work
0:07:57	i've
0:07:57	i
0:07:58	but it
0:07:58	like i'd estimated
0:08:00	so i think i need to show that
0:08:02	i need to compute it
0:08:03	because
0:08:04	i find some like
0:08:05	problem with the
0:08:06	new scoring when i don't estimated that
0:08:11	so
0:08:12	for the data to C N what we do is
0:08:15	after estimating lda would project all our background in this
0:08:19	slowly much of the space which is
0:08:20	we move from four hundred to two hundred
0:08:23	and after
0:08:24	we also the same background but not the same but all you make sure that the data to estimate
0:08:28	adaptive this year in two hundred space
0:08:31	so it's a
0:08:32	because for his W C C N is applied
0:08:34	sorry
0:08:35	right now
0:08:36	one
0:08:37	the basis is applied in the projected space
0:08:39	oh
0:08:40	and the a okay
0:08:42	so it's not
0:08:42	the origin of space
0:08:45	so here's some kind of
0:08:47	visualisation of
0:08:48	all the steps where all this kind of stuff so
0:08:51	is this five
0:08:52	speaker so is colour is this one speaker
0:08:54	and if one is uh one recording for or speaker
0:08:58	so that's is five
0:09:00	female speaker
0:09:01	so this is after lda projection
0:09:03	into the emotional
0:09:05	okay
0:09:06	so
0:09:06	if you know the other C C N
0:09:08	so
0:09:09	is it the same scatter
0:09:11	if you have the same here as in black scale
0:09:13	so we are minimising the intraspeaker variability
0:09:17	and when you do
0:09:18	W
0:09:19	let normalisation of course sciences
0:09:21	course going
0:09:22	you are going in the spherical area
0:09:24	here
0:09:24	so here the speaker one who the speaker to interspeaker tape
0:09:29	so this is why
0:09:30	to find out what what what what a fine
0:09:33	like
0:09:33	how how about explained this morning about the dissertation so all this
0:09:37	data on the same
0:09:38	fig
0:09:39	yeah
0:09:43	so
0:09:44	this is the
0:09:46	jack i'm off
0:09:47	that of brevity system
0:09:49	so
0:09:50	when
0:09:50	you have a look
0:09:51	not a lot of we've first we use a lot of nontarget speaker
0:09:54	like a lot of
0:09:55	lot lot of speaker whatsoever a recording for speakers
0:09:58	and
0:09:59	i use mfcc extraction i used to be into uh yeah my going to attain a B M
0:10:04	and after extract
0:10:05	the the what what statistical here
0:10:08	for all the same
0:10:10	sorry
0:10:11	all the same recording
0:10:12	and after a change of
0:10:13	max i tried to to train data but maybe two metrics
0:10:16	and then
0:10:17	here extract ivectors
0:10:19	for all this
0:10:20	uh
0:10:21	uh recording and then
0:10:23	i estimate and the N W C C S of his on the interview C N
0:10:26	it's not obedience ubm
0:10:28	so what i have a target
0:10:30	okay set according so
0:10:31	i just extract mfccs and the U D B M to excitable what statistic here
0:10:35	and upon my be extracted uh factors
0:10:38	and then
0:10:39	uh it was only and the innovation to normalise
0:10:42	the the the the the the the new
0:10:44	new vectors
0:10:45	okay
0:10:45	so when you have that yeah
0:10:47	we're the same person
0:10:48	and to getting that of a matrix
0:10:50	uh that's right at the top of factors
0:10:52	and then projected indiana B C C N and that can
0:10:55	and uh compute the cosine distance and make a final decision
0:11:00	so now
0:11:02	uh
0:11:03	i'll explain
0:11:04	what score normalisation is doing again
0:11:06	and the space
0:11:07	okay
0:11:07	in this
0:11:08	what what's gonna musician that we can get cosine distance scoring
0:11:10	so let me simplify some questions so this is like
0:11:14	cosine distance scoring first
0:11:15	okay
0:11:16	so let's use
0:11:18	like we call that a five normalised above factors which is the projection of
0:11:22	lda
0:11:23	and uh some ski decomposition of the within class parameterisation
0:11:27	so
0:11:28	and normalised by the land
0:11:30	so
0:11:30	in this case
0:11:31	cosine distance 'cause we can just
0:11:33	on the product
0:11:35	okay
0:11:36	so just
0:11:36	i just want to simplify
0:11:37	have a dot product okay
0:11:39	so
0:11:41	so
0:11:41	this is
0:11:42	you can see all this
0:11:44	like maybe
0:11:45	because we've with the first paper we say that W well opening W is feature extraction
0:11:50	so we can see also all this as a double as a feature extraction
0:11:53	'cause you do it
0:11:54	such a compensation
0:11:55	and of course i became or just
0:11:57	a dot product
0:11:59	so
0:12:00	no
0:12:01	if you have you want to see that
0:12:02	who started that someone's you know so we have a target speaker and the set of you know utterance okay
0:12:07	so we
0:12:08	or is it turns you extract the proposed factors
0:12:11	okay
0:12:12	and need to compute
0:12:13	the main
0:12:15	come the scores
0:12:16	the mean of the scores
0:12:17	and the standard deviation of the schools okay
0:12:19	so
0:12:20	i tried to say
0:12:21	how to how what is
0:12:22	the mean and so another innovation is doing
0:12:24	okay what the what is that what it's got what's the value that
0:12:27	so i try
0:12:28	display so
0:12:29	is it a
0:12:30	it so it
0:12:30	for every
0:12:32	is that you know impostors
0:12:33	i tried to spit in schools
0:12:35	okay just the product between target
0:12:37	and uh posters
0:12:38	and
0:12:39	it's divided by and this is the main
0:12:41	okay
0:12:42	so
0:12:43	the target speakers
0:12:44	if you to simplify that you take this
0:12:46	oh
0:12:47	it's just
0:12:48	the product
0:12:49	with win
0:12:49	target
0:12:50	unnormalised eigenvectors
0:12:52	and
0:12:52	the mean of
0:12:54	yeah posters you know about that the normalised factor
0:12:58	okay
0:12:59	so this is the me
0:13:00	okay
0:13:01	so
0:13:02	and this is the um posters
0:13:04	uh no multiple vectors means
0:13:06	okay so and and the number
0:13:09	of
0:13:09	and posture for the teen forms you know
0:13:11	so if you see for standard deviation
0:13:14	you do the same price
0:13:14	process
0:13:15	you have this
0:13:17	scores
0:13:17	four
0:13:19	for the between target and impostors you knows
0:13:22	and is it the meeting which is
0:13:24	exactly this one
0:13:26	okay
0:13:27	so the but product between the two
0:13:29	almost i get uh to to normalise
0:13:32	uh target speakers
0:13:34	and the impostors
0:13:35	i mean
0:13:37	and if you to go if we take
0:13:39	the
0:13:39	they're not targeting the target
0:13:41	oh
0:13:42	so here
0:13:42	you can see this is
0:13:44	the covariance matrix
0:13:45	all the
0:13:47	of the yeah uh apostasy no
0:13:50	okay
0:13:50	so
0:13:51	score normalisation which is you know
0:13:53	is just
0:13:54	no if you
0:13:55	you're trying to but in the
0:13:57	a question of
0:13:58	how do score normalisation
0:13:59	it's just
0:14:00	shifting
0:14:02	the task
0:14:03	normalisation by the mean
0:14:05	all the impostors
0:14:06	and the week another that normalisation
0:14:08	but this time normalisation is base it only
0:14:11	oh uh between class
0:14:13	impostor
0:14:15	okay this is an apostle so
0:14:17	this is mean that we are going for the for this you know if i want to do is you
0:14:20	know what do in another that normalisation that
0:14:23	the direction is base it on
0:14:25	maximising the distance between
0:14:28	a poster
0:14:30	okay
0:14:31	in a similar way
0:14:34	you can find
0:14:36	that you know so you know
0:14:37	is
0:14:38	this you know example is shifting the test
0:14:40	you know is
0:14:41	shifting the body
0:14:43	where the me
0:14:44	and doing
0:14:44	that that normalisation of the test
0:14:46	you know is doing
0:14:48	that minimises on the target
0:14:50	do you know was doing that normalisation of the best
0:14:52	with some kind of covariance
0:14:55	between a poster
0:14:56	so
0:14:58	we will
0:15:00	new scoring
0:15:01	one assuming ideal which is not is not exactly easy to you know
0:15:05	it would save you just amaze you can also
0:15:08	we shift
0:15:09	target
0:15:11	we we we was like some background of impostors and we compute the mean of that
0:15:15	and we shift the target
0:15:17	that's why we should target
0:15:19	here
0:15:20	and normalised target
0:15:21	that's done factors and also for the test by the impostor
0:15:25	means
0:15:25	and
0:15:26	no my the bottom end of the test and
0:15:29	the target a
0:15:30	based on
0:15:31	between awaiting covariance
0:15:33	a posters
0:15:36	so
0:15:36	another one
0:15:38	uh is that some of that
0:15:39	i think he was and uh secondary anyway factories newspaper notice it
0:15:44	doesn't then
0:15:45	so it's as well and
0:15:47	this in this case for us and all this is exactly that's not
0:15:51	we well because what as women doing this may be seen on a systematic it's going that was eating omitting
0:15:56	always the same
0:15:57	so
0:15:57	it's for the target shifting the task
0:16:00	and normalising by the target here it shift in the target tantalising but this
0:16:04	so this is exactly as well so we can do as well
0:16:07	without any
0:16:08	all windy per parameter estimation just
0:16:10	not about maybe the space
0:16:12	so
0:16:13	this
0:16:14	kind of
0:16:15	it's going
0:16:15	have a lot to speed up the process more
0:16:18	so the only just compute the cosine distance so now we can do it as you know uh
0:16:22	maybe seem easy to you know or
0:16:24	complete as long
0:16:25	in this paper maybe this paper
0:16:30	so then do some experiments
0:16:32	so
0:16:33	we used two thousand forty eight abortions
0:16:35	with the motion to sixty like we have ninety percent of T C as you know jeepers that of that
0:16:40	of that
0:16:40	is an old system that they have i don't
0:16:42	do you need a date for that
0:16:44	right
0:16:44	i did or both horizontal so it doesn't and um
0:16:47	sorry for that
0:16:48	so is four hundred benefactors
0:16:50	lda reduced a hundred
0:16:52	and the basis in is applied in two hundred space
0:16:55	and
0:16:56	use some kind of one of our one thousand you norm
0:16:59	and two hundred yet you know
0:17:01	for as normal we use all overcome by all the apostle together
0:17:05	and
0:17:06	for the uh
0:17:08	for the mean and the covariance of the new scoring
0:17:11	we use
0:17:11	all together all the impostor together
0:17:14	but we use diagonal covariance matrix for the impostors just
0:17:17	to speed up the process and make an experiment
0:17:19	we can use the force to
0:17:22	so here
0:17:24	a lot of people ask me how you but at a very poor spatial trying to build this
0:17:28	that table to show
0:17:30	how can train your
0:17:31	lda and where
0:17:32	which database
0:17:33	so
0:17:34	for the A B M we use switchboard
0:17:36	uh
0:17:37	fig switchboard about senior and uh landline
0:17:40	uh we use discover four and five
0:17:43	what about a bit that we use all the data
0:17:45	so what's the type that you have more of it is is it
0:17:48	and
0:17:49	use like
0:17:50	minimum speaker that have to recording
0:17:52	to be the order of a matrix
0:17:55	okay this is the first time of the sixty to use fisher data to in the factorises because patrick died
0:17:59	in the past with the jfa and he that have success
0:18:02	with that
0:18:03	um
0:18:04	lda i use
0:18:06	switchboard and nist and four and five
0:18:09	and because i tried to model this
0:18:11	but with speaker variability so we need more speakers
0:18:14	for them use this year was surprising that
0:18:16	i i found that the best result is only for two of the four and five
0:18:20	maybe because
0:18:21	in which are data we have this kind of speaker
0:18:23	speaking different
0:18:25	their phone numbers and telephone
0:18:26	compared to switchboard
0:18:28	i'm not maybe
0:18:29	this is what we need only make two thousand four and five
0:18:33	okay so this is the uh
0:18:36	the uh uh there's a lot so
0:18:39	i tried to sit and core condition
0:18:41	uh often times eight
0:18:43	result only female part
0:18:45	uh portion
0:18:46	so
0:18:47	i just want to compare that
0:18:49	the score normalisation is working here
0:18:50	i forget to put this score without score normalisation sorry
0:18:53	uh
0:18:54	so
0:18:55	this is the origin of scoring
0:18:57	like
0:18:57	go find the so was it you know
0:18:59	as a group we should in the past
0:19:00	and uh
0:19:02	when you do a new
0:19:03	like
0:19:04	is uh
0:19:04	and use it you know which image it or not
0:19:06	it's
0:19:07	you would most like to but incorporates your point five an absolutely great
0:19:11	but
0:19:12	um
0:19:13	within this year why the same
0:19:15	there's not very basic that improvement
0:19:17	however for all try we have some kind of a job because he's english trials and his all time when
0:19:23	we have
0:19:23	different languages
0:19:25	and
0:19:25	here
0:19:26	this year the accord and this you have was good very good
0:19:29	in this new city knobs units scoring
0:19:32	okay
0:19:32	so
0:19:33	it's nice norm
0:19:34	it's quite this
0:19:35	competitive results
0:19:36	and getting better result in all tries applied to do
0:19:39	like original scoring
0:19:41	and
0:19:42	so it seems like we can do score normalisation in this
0:19:45	in the above a vector space so there's no problem for that
0:19:50	so this is intense again that's again the results
0:19:52	so
0:19:53	here
0:19:54	big
0:19:54	i like to uh okay
0:19:56	i like
0:19:57	um
0:19:58	core condition
0:19:59	we find that
0:20:01	it's had a lot here
0:20:02	it's it's improving the performance
0:20:04	uh not for the dcf patrol decorate
0:20:07	and also for this you have all trials
0:20:09	and that's that's not what was doing very well here in the second uh second
0:20:13	compared to the core condition
0:20:15	so
0:20:16	and the conclusion
0:20:18	so
0:20:19	for this paper i try to uh
0:20:21	simplify life
0:20:22	again
0:20:23	by making the score normalisation and a very this space
0:20:27	so which makes the process more simple and more fast
0:20:30	if you want to try to optimise the or
0:20:31	cosine the some scoring
0:20:33	and
0:20:34	we do it for
0:20:35	for the purpose of doing some speak and some adaptation
0:20:37	no that's not that's not up to date a parameter of the
0:20:41	but the the that you know how much you know
0:20:43	and the answer but adaptation
0:20:45	so
0:20:45	stephen was talking more about
0:20:47	after the start
0:20:48	and thank you
0:20:58	distance for magazine
0:21:10	occlusion
0:21:11	um like you say
0:21:13	right
0:21:13	yes
0:21:15	oh
0:21:16	scroll through
0:21:19	oh
0:21:21	uh
0:21:22	hmmm
0:21:24	yeah
0:21:25	uh
0:21:26	no
0:21:27	yeah
0:21:27	oh
0:21:28	where
0:21:29	so
0:21:30	just
0:21:30	true
0:21:31	yeah
0:21:32	uh
0:21:34	use power
0:21:35	or something
0:21:36	sure
0:21:37	uh the the uh was
0:21:39	oh boy
0:21:41	most
0:21:42	yes
0:21:43	okay
0:21:44	the question
0:21:45	is that
0:21:46	right
0:21:46	oh
0:21:47	most
0:21:49	yes
0:21:50	oh
0:21:51	so
0:21:54	well
0:21:55	so
0:21:57	okay
0:21:58	no
0:21:58	yeah
0:21:59	the most
0:22:00	yeah
0:22:01	uh
0:22:01	hmmm
0:22:02	some
0:22:03	right
0:22:05	you know
0:22:07	so if you can
0:22:09	hmmm
0:22:10	yeah
0:22:11	most
0:22:12	maximum normalisation
0:22:13	such that
0:22:14	hmmm
0:22:15	so
0:22:16	school
0:22:18	so
0:22:24	but the point
0:22:25	okay so
0:22:26	so one of them so
0:22:28	selecting uh
0:22:29	emphasising you space
0:22:31	based on the different
0:22:32	right
0:22:34	which is
0:22:35	uh_huh and i'm wondering if you could modify
0:22:37	the normalisation approach
0:22:40	but you
0:22:40	you know
0:22:42	posted
0:22:42	yeah
0:22:43	you could modify
0:22:44	such that
0:22:45	who are loosely coupled
0:22:47	okay
0:22:47	function
0:22:49	score
0:22:51	that's
0:22:52	that's can be
0:22:53	good point here because
0:22:55	ah
0:22:55	this length normalisation
0:22:59	okay if i try to do
0:23:02	if i tried and stand like
0:23:04	for example for that you know so
0:23:06	i try to be okay
0:23:07	i
0:23:07	when i did cosine distance away by doing a D N W C A so i removing some i'm removing
0:23:13	the within class
0:23:14	but here pentium okay
0:23:15	do wanna do that normalisation and take to the the reformation of
0:23:18	maximising this year between speaker
0:23:21	it can be seen as a between speaker via the the map quest metric
0:23:25	so
0:23:26	it seems like
0:23:27	i am quite losing information between speakers
0:23:30	but with a the other basis ian
0:23:32	that's true
0:23:33	when i see this kind of things
0:23:34	it seems like
0:23:35	i am doing something
0:23:37	that it hurt me
0:23:40	yeah right but this is a good point
0:23:42	that the basis yes or no
0:23:44	it's like
0:23:45	we have a nice
0:23:46	a dog
0:23:47	here
0:23:48	like at the end of it is it is a project that i'll do it again
0:23:51	but this may this
0:23:52	all the way that
0:23:52	i need to
0:23:53	no interaction of
0:23:55	the the speaker
0:23:57	right
0:23:58	so
0:23:59	i don't know how to do that yet
0:24:00	because like
0:24:03	looks to zero
0:24:03	huh
0:24:04	this is an excellent
0:24:05	yes
0:24:06	i
0:24:07	okay
0:24:12	i have a comment regarding decency kristin i try to
0:24:16	do length normalisation for the eighteen
0:24:19	the B C C N
0:24:20	actually
0:24:22	it had
0:24:22	yeah
0:24:23	before the division
0:24:24	i i i just do it length normalisation
0:24:27	before
0:24:28	it's peoples
0:24:29	then they do then the W C C
0:24:32	i tried but they didn't have
0:24:34	so we have
0:24:36	a way to talk
0:24:37	i try and
0:24:38	i think more one that also try it
0:24:40	you tried one but
0:24:42	no yes and
0:24:44	the funds not having a
0:24:47	and the
0:24:48	yeah
0:24:58	a quick question
0:24:59	um
0:25:01	so
0:25:01	so
0:25:03	cool
0:25:03	hmmm
0:25:04	you know
0:25:06	um
0:25:06	and
0:25:08	so and so i mean
0:25:11	and
0:25:12	the current remotes for
0:25:14	what
0:25:15	this is gonna prove most most
0:25:19	yeah
0:25:20	where
0:25:21	um i'm just wondering you know
0:25:25	hmmm
0:25:25	is that
0:25:26	you know
0:25:27	and the the actual models
0:25:30	yeah
0:25:31	cluster
0:25:31	but
0:25:33	um
0:25:34	um
0:25:35	like you
0:25:35	so like you used to i'm just
0:25:38	you know i said oh i see is it really
0:25:41	this is not exactly is it you know
0:25:42	i don't know so i just wanna have your
0:25:45	oh well
0:25:46	where
0:25:47	the mean and variance
0:25:49	when
0:25:50	i mean
0:25:51	is used them go denotes the
0:25:54	yeah
0:25:54	proof
0:25:55	posted
0:25:55	but
0:25:56	hmmm
0:25:57	and you don't actually need
0:26:00	or
0:26:00	hmmm
0:26:01	you wanna try to do you know in this new scoring or would you
0:26:04	no no i'm just
0:26:05	well you you have you mean in your room
0:26:08	hmmm
0:26:09	and i mean and number and you were
0:26:12	computing the number one
0:26:13	the
0:26:15	i mean and variance
0:26:17	and
0:26:18	right
0:26:19	uh_huh
0:26:20	okay
0:26:21	um
0:26:32	yeah so i don't know uh when when you do that you know which is the process
0:26:36	yes
0:26:37	yes and you explain to
0:26:39	yeah
0:26:39	so i'm just wondering where
0:26:41	when
0:26:41	my the
0:26:42	system is calibrated and also the units
0:26:45	you mean
0:26:46	okay that's good that the ah
0:26:48	i try to understand what the third one is doing in the middle but i never six said to
0:26:54	yeah i know
0:26:55	i know
0:26:55	i know and it's uh
0:26:57	i
0:26:59	i never sick said to do that but i tried to see if my
0:27:02	system is not
0:27:03	is
0:27:03	is
0:27:04	if you compare the result is not
0:27:06	what the same as it you know
0:27:08	the only nicole rate that change a little bit
0:27:10	but anyways
0:27:11	scene
0:27:11	that they have it that's not what calibre distill what kind of it
0:27:14	but
0:27:15	but
0:27:16	not good
0:27:18	if you have any comment about how we can put the third part i will be happy to
0:27:29	because like i did it i did it as normal
0:27:32	because i needed in the new version of the system but
0:27:34	did you know mine
0:27:36	i had to start but i don't know how to do it
0:27:41	and here
0:27:42	uh if you are in this one comment if you are doing like
0:27:46	max for example we have training and the telephone and
0:27:49	make a but that's in the microphone
0:27:50	so we can do this different
0:27:53	based on which database are using
0:27:55	so which can help you
0:27:57	in the crosstalk
0:27:58	uh not to construct
0:27:59	costs uh channel
0:28:02	right
0:28:06	thank you very much larger than here

Cosine Similarity Scoring without Score Normalization Techniques

SESSION 4: Speaker and language recognition – scoring, confidences and calibration