Přepis řeči - DISSOLVE DETECTION IN ABSTRACT VIDEO CONTENTS

0:00:13	i you
0:00:14	so
0:00:15	i
0:00:16	or with the laboratory relay
0:00:18	university board a rest
0:00:20	and also i'm of where it from a more realistic from tech
0:00:24	or shown somebody
0:00:25	so what i'm going to present
0:00:26	oh these all approach
0:00:28	which
0:00:29	copes with the particle domain mean namely the animated immediate
0:00:33	so the presentation of a line for type i'm going to to the problem state one
0:00:38	if is then a pretty state of the art of the to sure
0:00:41	proposed proposed approach
0:00:43	experimental results and finally conclude the paper
0:00:46	so
0:00:47	use of that action is um or is part of a more general problem which is temporal segmentation
0:00:53	oh you don't
0:00:54	because temporal segmentation me
0:00:57	it's composing the V don't to its fundamental
0:01:00	temporal do needs for
0:01:01	be do so
0:01:03	a be do so
0:01:05	a sequence of images which
0:01:07	are four db P of a common or
0:01:09	and um
0:01:10	so basically to to to get of the final find a movie of the final
0:01:14	sequence
0:01:15	one has to put to get a all of this short
0:01:18	which are are Y
0:01:19	what we call gradual transitions which are do not
0:01:22	that
0:01:23	the image
0:01:24	so basically performing the temporal or segmentation means
0:01:28	uh on the for basis to be a the be doctrine
0:01:31	so we have two classes of we we foundations of for form
0:01:36	it's called sharp transitions or cuts
0:01:38	which are the direct concatenation
0:01:40	two different roles so here you have the time line
0:01:43	you have shown one
0:01:44	which is connected to show
0:01:45	so here i got a car
0:01:47	so
0:01:48	they are the most frequent
0:01:49	for instance
0:01:50	a mean of
0:01:51	a a bit of for the chip
0:01:53	the one cards
0:01:55	and the existing approaches i part quite
0:01:59	a a highly accurate
0:02:00	we got easily and ninety five percent correct detection
0:02:04	you can see the the only results of the trick the benchmark mark and compare
0:02:09	on the other hand there are the gradual transition which are
0:02:12	fourteen time before
0:02:13	and
0:02:14	the most common we
0:02:16	natural movies or or be is in general
0:02:18	are
0:02:19	face
0:02:20	which
0:02:21	here i have be give a fate in sequence
0:02:24	which is a is the progress a partition of one of each
0:02:27	starting with a constant image
0:02:29	typically that
0:02:30	the other
0:02:32	a kind of a idea of trying to are the diesel of each arm much more complex because they are
0:02:37	the transformation of one in each
0:02:39	start image
0:02:39	into two but second image which is done
0:02:42	glad
0:02:43	so
0:02:44	compared to cost they are less frequent at least one word or measure last and the existing methods are not
0:02:51	a very high reliable
0:02:52	that's say we have a average
0:02:54	corner detection between seventy and four
0:02:57	for
0:02:59	so white white board for means the temporal segmentation
0:03:02	i i'm going to an to to to report results so
0:03:06	for the on a like this work was it it the way of understanding
0:03:10	the structure of the
0:03:11	of the be
0:03:13	on the other hand we have but the content description
0:03:16	for instance the many summarisation a scheme matters are
0:03:20	or based on temporal segmentation or
0:03:22	oh there are many approaches which can see are the action
0:03:26	relate it was high frequency all
0:03:28	for change
0:03:29	and
0:03:31	a to this domain which is the animated movies that use of great you trying to transition has
0:03:37	semantic meaning
0:03:39	okay
0:03:40	so
0:03:41	how well i'm going to be then
0:03:43	some of the
0:03:44	but matters
0:03:45	field
0:03:46	well that's are with that we a definition of this
0:03:48	transition to so supposing would have to sequence is to short
0:03:52	S one and S two
0:03:54	so that is all
0:03:56	transition which is
0:03:58	obtained by combining the too
0:04:00	of duration they can express it
0:04:03	a at intensity level
0:04:06	i is the linear combination but between the two seconds
0:04:09	with
0:04:09	a sequence sorry
0:04:11	oh using a do a linear or more point function F one and F
0:04:16	some common functions are
0:04:19	such as the one i have presented yes so if on a steep because it decreasing
0:04:23	for one as you know why
0:04:25	the second
0:04:25	function F two is
0:04:27	typically increasing so basically what we have here we have a
0:04:31	a a doll sequence of the for four
0:04:33	which is cool we
0:04:35	the fading in E C one of the second show
0:04:39	so basically we have
0:04:40	a fade out
0:04:41	cool be the fading C
0:04:44	uh
0:04:45	oh
0:04:46	this kind of time of these of are much more complex to detect compared to the others in one to
0:04:51	two face because first of all
0:04:53	very hard to
0:04:54	to be beat or is a separate
0:04:58	uh they they tend to show similar time signature with other channel or or object more
0:05:04	based support main evaluation colour X more
0:05:07	that that that a and they may have a
0:05:09	caught a similar colour is the motion a structure
0:05:12	if formation for the whole for the two source of the first one is the can which is a problem
0:05:17	so the existing method of equal are divided into several categories of first on it
0:05:22	pixel intensity by
0:05:23	transform base
0:05:25	feature red and there are some other approaches which
0:05:28	i don't mixed
0:05:29	a fourth one or propose a different solutions so i going to present
0:05:33	from each some representative a approach you which are connected to our or
0:05:38	oh one of the first approach well you who was using you you in each difference is
0:05:43	so
0:05:43	i
0:05:45	a was to to accumulate the distance between consecutive frames
0:05:49	which
0:05:50	a should be greater than a of force threshold T one one for
0:05:55	the difference for consecutive frames should stay below a second threshold
0:05:59	T two which is if you to do you want so basically
0:06:02	it the
0:06:03	computes the successive difference
0:06:06	which are provided by a is all sequence
0:06:08	do this work not only for is on but we gradual transmission in general
0:06:14	a another approach use the mathematical definition
0:06:17	so
0:06:18	space that mean and variance of pixel intensity show
0:06:21	a linear and quadratic
0:06:24	oh
0:06:24	behavior
0:06:25	so that is find it on on the as a you need we if you are going to compute the
0:06:29	variance
0:06:30	of what use of
0:06:31	sequence once for a different T
0:06:33	or want of time
0:06:34	we got a
0:06:35	quadratic
0:06:36	behave or
0:06:37	we the F one and F two function
0:06:40	so we if you are going to do the mad and replacing the to function
0:06:43	we are going to obtain
0:06:45	a quadratic behavior
0:06:47	according to
0:06:48	i
0:06:48	so here
0:06:49	where a a C R three constants
0:06:52	which are in time and keeping in depend
0:06:55	oh
0:06:55	we can we can uh detect these signature by applying for first or a second or or do but they
0:07:00	do but is in order to to do you
0:07:02	either a linear
0:07:03	decrease or a constant to
0:07:06	cost and value of of the of the this fun
0:07:10	uh another approach is
0:07:12	based on the optical fact
0:07:14	i i just my so is a superposition of of fading fade out and in sequence
0:07:20	so it detect the amount of fading dean and fading out peaks that which is also the basis for
0:07:27	our at forty
0:07:28	so
0:07:29	generally you you based approaches are very reliable
0:07:33	similar to to to the but that's for quite detection
0:07:37	other approach
0:07:38	are
0:07:39	transform base
0:07:40	for instance performing forming the detection on the compressed domain
0:07:44	this is my work for a real-time performance but
0:07:48	that
0:07:49	uh the the effect is a quite a visual that we need
0:07:52	some kind of visual information not
0:07:54	only
0:07:55	according
0:07:56	for or frequency domain or
0:07:58	something similar so
0:08:00	usually lead to increase accuracy a least we have to D compressed was that level of detail
0:08:06	second and copy what you are feature rate here and going to present a class of one which is based
0:08:10	on contour and edging formations so it's use
0:08:13	is the same assumption
0:08:15	so
0:08:16	come to each peak cells from a uh as a starting show are going to disappear
0:08:20	why as a can be are from the final four are going to yeah
0:08:24	so
0:08:25	one classic approach used to compute
0:08:28	a edge change ratio
0:08:30	for
0:08:31	disappearing feature
0:08:32	H for edge
0:08:33	excels and appearing in edge peaks that for instance that's here
0:08:36	we have a
0:08:37	the amount of
0:08:38	because of quantum piece cells
0:08:40	which is that appeared from image at time K
0:08:43	divided by the total number of
0:08:45	can two points
0:08:46	so called my complete do that too they they should
0:08:49	should the provide a high value for a for a dissolve
0:08:53	other produce
0:08:54	that to use feature points like so or see that it's at the top
0:08:58	oh the program we
0:09:00	feature in for is very sensitive to motion or visual
0:09:05	so we do not know the information that the use most
0:09:07	in fact all of the existing a dissolve detection method are
0:09:11	design actually designed to cope with natural and be because that that was the target so
0:09:17	in this paper we address the particular domain mean which is artistic animated movies are not be
0:09:23	we stick by a car to ones
0:09:25	there are quite a different
0:09:26	so
0:09:27	and emission mission in the is become a uh
0:09:31	that's say an important entertainment in the three
0:09:34	from the artistic point of view and also from the entertainment for to there are a lot of it was
0:09:39	there are at a or a lot of commercial movie your high i i have used D
0:09:43	the of the of the
0:09:45	because i state
0:09:47	what the law uh cannot up work together and see france for instance
0:09:50	the the international house and made at feel more as
0:09:54	it's one of the major events in the fields there are
0:09:56	a lot of movies competing
0:09:58	so
0:10:00	a it became a a problem to two
0:10:02	to process or from or segmentation to this
0:10:05	domain
0:10:06	the problem is
0:10:07	artistic animated movies are
0:10:09	quite different from natural ones
0:10:11	in many respects here i'm going to present some of the
0:10:14	the most
0:10:14	importance so first of one that are many only make animation taking
0:10:19	you got paper drawing
0:10:20	three D
0:10:22	and an object animation blast
0:10:24	C modeling so it's
0:10:26	the content is very in very different
0:10:29	also
0:10:29	the motion and not
0:10:31	always want you know that you to the animation techniques there are a lot of movies which are made by
0:10:36	stop motion
0:10:38	take or which are made frame by frame
0:10:41	also each movie tend to have a a different colour but i here you have a
0:10:45	i i one each or or or two images from a one and with still
0:10:50	so they they tend to have
0:10:52	a specific colour well that
0:10:54	uh that the knees
0:10:56	quite
0:10:56	fiction or or a highly abstract
0:10:58	you have a lot of visual F X job i
0:11:01	strange and also there on of physical so we you we can we cannot to
0:11:06	unlike the
0:11:08	uh the events from the class
0:11:10	point to we so
0:11:11	basically you can have anything
0:11:14	objects appear disappear
0:11:16	any kind of visual F X so that is no
0:11:18	can
0:11:19	oh that is there is no
0:11:20	continuous flow
0:11:22	so
0:11:23	the problem them at the we propose is quite simple but
0:11:26	a yet efficient
0:11:28	what we do we use only intensity information
0:11:32	and for each
0:11:33	frame we are going to compute
0:11:35	what we call
0:11:36	fading excel
0:11:37	it the simple racial with
0:11:39	the amount of fading out its cells
0:11:41	plus
0:11:42	the amount of training in excel
0:11:44	which is normalized a is back to one of this is a in this size
0:11:48	so basically we if we if we are going to a a like this
0:11:52	measured you at time shown
0:11:54	for
0:11:54	use old
0:11:55	like
0:11:57	uh you isolated peaks
0:11:58	the problem is how to make the difference between these all star nation and are
0:12:03	changes which are due to motion or visual X
0:12:06	so for that we use but between thresholding approach which i shall describe in the form
0:12:11	so
0:12:12	first of all
0:12:13	in order to overcome for all this one you need you we are going to analyse the fading he's than
0:12:18	in of very restrained
0:12:19	time don't of only three for
0:12:22	that is a localisation using that winters for so we have to situation we have a
0:12:28	uh
0:12:29	that is all which are
0:12:31	clearly not which provide a than not a number of fading use so which is quite fight
0:12:35	so when whether we have
0:12:37	the number of fading be solved
0:12:38	a greater than a than a certain threshold
0:12:41	and
0:12:42	a these value when there is a a lot of i thing we can declare a dissolve in the in
0:12:46	there but
0:12:47	uh
0:12:48	between i
0:12:50	and last
0:12:51	how to max
0:12:52	on the on the on the both sides where T mess is the that's say
0:12:57	an average is all
0:12:59	a
0:13:01	so that the the most simple situation we got
0:13:03	oh that is on but there are some other the also which show
0:13:07	a lower
0:13:08	level of fighting be a and which are cool with all which are put to
0:13:13	in other transition like motion
0:13:15	or a visual X so we use
0:13:17	we use a second trash for which is a
0:13:19	quite a lower
0:13:20	is lower than the first one we call it the tolerance threshold
0:13:23	when are the F B is greater than the second verse what we may have a dissolve transition
0:13:29	in fact
0:13:30	uh the the frame you made
0:13:32	maybe a dissolve middle frame
0:13:34	so two
0:13:35	to find it is easy is all
0:13:38	what we are we are looking for in
0:13:40	oh um you know a decreasing in on both sides
0:13:43	all this is that
0:13:44	so basically having been
0:13:46	an mac
0:13:47	but what we do here i have think that i
0:13:50	uh a that a P function for a
0:13:53	a a segment of of of a we we have a
0:13:56	to that as a clear is old here
0:13:58	and we have the to search for the sort and search for that for that one
0:14:02	what but has some other on which are what we
0:14:05	some other for change still
0:14:06	what we do
0:14:07	well we you detected a peak a greater on the second as well
0:14:11	we are going
0:14:12	to
0:14:12	detected
0:14:13	time ones where
0:14:15	a a if i'm function start increasing the again that on the right and on the left
0:14:20	once we got the those times more ones
0:14:22	what we we are going to a to assess
0:14:25	the and it would be to and the B and those
0:14:27	to values which are denoted
0:14:29	you left and you die
0:14:30	so
0:14:31	the
0:14:33	transition these value shall be at is on each
0:14:37	the to that is are great and then hop
0:14:40	the size of the
0:14:42	be that
0:14:43	the F B I
0:14:45	so we are going to be clear
0:14:46	that is all
0:14:48	okay
0:14:49	uh we have tested our uh our approach on of
0:14:53	five hundred and S to D all that's several on a midi sequence is for each i have a peak
0:14:58	at the
0:14:59	a label according to that is the and if you could is that we have a high it difficult content
0:15:04	we shall see at the end some examples to
0:15:07	as see how how to
0:15:08	how bizarre
0:15:09	a contents are and average difficulty
0:15:12	so to was this perform a we use the class
0:15:14	or you don't cold the racial so precision is about false detection
0:15:18	while you call is a well-known detection
0:15:21	so
0:15:22	what are the results so
0:15:24	or or one we got
0:15:25	a precision of
0:15:27	ninety four percent white thirty four is close to eighty percent that
0:15:31	you can i sixty good detection and only twenty three for detection
0:15:37	but at the sequence level
0:15:39	precision and recall racial a range of four
0:15:42	at T C to one hundred and the record
0:15:45	step one P two one hundred so
0:15:47	we have
0:15:48	certain second for which we detect all
0:15:51	all the mission
0:15:52	and there are some for which we we
0:15:54	we which you to the
0:15:56	very complex and we got to a little or detection issue
0:16:01	so
0:16:02	we we we have a to compare our of what which is quite simple
0:16:05	to the existing approaches
0:16:07	so
0:16:08	we have to choose
0:16:09	three of them the variance of pixel intensities the one i have presented in the introduction
0:16:14	okay
0:16:15	and the edge
0:16:16	change
0:16:17	range that they should be um a which is based on two hundred so here we have an example for
0:16:22	one movie which is for mister part
0:16:24	so we have a
0:16:25	trace the
0:16:26	the variance of be in T D here we have a
0:16:29	D that is on problem to reach he's marked with vertical but lines
0:16:32	so
0:16:33	we can see that there is no problem shape which is stated by the definition we
0:16:36	we can now we can we cannot use it
0:16:39	if you are tracing the the exchange ratio
0:16:42	we see it it it's very
0:16:44	a highly sensitive to visual F X and noise
0:16:47	practically
0:16:48	unusable usable and if you are a things that the proposed measure
0:16:52	we can see whether there are some of duration that is also a quite
0:16:56	oh oh that limited
0:16:58	and not an example of a we which is
0:17:00	the
0:17:01	complex as to buy the which show very discontinuous content
0:17:04	we got the
0:17:05	very which is
0:17:07	which show a particular
0:17:08	signature what what is not a part shape
0:17:11	for green for is not reliable because we don't have a lot of in the movie
0:17:15	while that's a classic approach example for our with we get
0:17:18	very good
0:17:19	all
0:17:20	detections so
0:17:21	basic we we were unable to compare the precision and recall or for four
0:17:27	but
0:17:27	approach approach to because we couldn't
0:17:30	make them board
0:17:31	uh uh i'm going to show you a few examples of
0:17:34	a all which were successfully detect a and also to see
0:17:38	the difficulty of the of here
0:17:40	uh
0:17:41	i'm going to show it on a typical dissolve transmission
0:17:44	it's quite strange so that's a classic animated movies a
0:17:48	but is similar to a fate by is quite a a a quite a diesel
0:17:52	if fact
0:17:53	here we got a dissolve transition which
0:17:56	in which both
0:17:57	what will are short are very similar from the point of view of the structure and also the colour
0:18:02	and you it it's a tough
0:18:04	the the use of trying to which is
0:18:06	uh
0:18:07	called with a a lot of motion and a very a lot of intensity variation which
0:18:12	is also successfully detect
0:18:14	so
0:18:16	we have proposed an intensity based approach is it's a simple matter is quite a of fast an efficient method
0:18:21	to our to or corpus
0:18:23	what are the main limitation so
0:18:26	forced to one is the choice of of several threshold
0:18:29	we had an able to detect of that this all model as you can
0:18:33	a channel and
0:18:34	we have some probably some of the phase which is reduced or
0:18:38	a sometimes as
0:18:40	this is you to the the pixels
0:18:43	oh
0:18:44	thank you for a for hour
0:18:50	i think about them
0:18:51	any questions have time for one
0:18:54	yes
0:18:55	to do you mine do but just getting the
0:18:57	my
0:19:12	a i i just does of forty four
0:19:15	okay
0:19:15	so for them matter which is this a to information as we use the canny edge detector to know that
0:19:21	retrieval
0:19:24	okay
0:19:25	um
0:19:27	another one
0:19:29	how how to compute
0:19:30	you are a precision and recall
0:19:32	um are you considering a and to in which the detection of a describe right or is
0:19:37	one single frame
0:19:38	but that's that's that would that the good which and so for we are are a menu labelling as the
0:19:43	sequence of simple so we are basically detecting by hand uh well that is all are yeah and uh i'm
0:19:48	considering reporting detection you find look at
0:19:51	yeah
0:19:52	yeah i'm support supporting support already the use of it
0:19:55	but at least they want people
0:19:57	oh
0:19:58	we we are not image to detect that is this
0:20:00	yeah you you like to so we are so you have to suburb and yeah in in which
0:20:04	detection detection it can see does have a someone to what them because can in fact we are not able
0:20:08	to detect the one but so we can what to
0:20:11	but we can
0:20:13	that
0:20:13	not problem it was
0:20:15	we can uh a statistic
0:20:17	we can do that is probably
0:20:19	the average is or land for each domain
0:20:22	for animated movies
0:20:24	was segment three second
0:20:25	what for natural reasons maybe twice
0:20:28	or
0:20:29	that depends on a on a on this domain
0:20:31	right
0:20:32	thank you very much for example

DISSOLVE DETECTION IN ABSTRACT VIDEO CONTENTS

Video Analysis and Processing

Přednášející: Bogdan Ionescu, Autoři: Bogdan Ionescu, Constantin Vertan, University Politehnica of Bucharest, Romania; Patrick Lambert, University of Savoie, France