Přepis řeči - ACCURATE NON-ITERATIVE DEPTH LAYER EXTRACTION ALGORITHM FOR IMAGE BASED RENDERING

0:00:13	i can much
0:00:14	um
0:00:15	good morning ladies and that's judgement
0:00:17	my i'm james space and i'm on his that my work
0:00:19	than with P we to in mike brooks on
0:00:22	that their extraction of that
0:00:23	image based rendering
0:00:26	my to cover
0:00:27	a brief introduction of the area
0:00:29	are are with them
0:00:31	and evaluation of the output
0:00:32	and a conclusion
0:00:34	what is used
0:00:36	we take just camera of these
0:00:38	and synthesized new virtual views
0:00:40	of the thing
0:00:42	a few real world examples all
0:00:45	google C
0:00:46	three D a model
0:00:49	and the whole time
0:00:51	and
0:00:52	P C research is
0:00:53	in hospice a T V and to for
0:00:57	one approach to be says
0:00:59	is a model based rendering
0:01:00	what you take it is detail to geometric model
0:01:03	i at extra map
0:01:05	and generate V views
0:01:07	you image is a need it
0:01:09	but a high degree a much information is point
0:01:12	creation model is often so and competition
0:01:16	don't so approach
0:01:17	would be image based you're
0:01:19	what you synthesized new views directly from an images
0:01:23	now geometric information is required
0:01:25	and you can get a will stick account
0:01:27	however
0:01:28	you do need a very large number in
0:01:30	image
0:01:32	our approach is
0:01:33	between these two extremes
0:01:36	we have to use a reasonable number of images
0:01:40	using a simple do much model
0:01:42	into into a lot of computation
0:01:46	and and for a jury to see
0:01:47	is
0:01:49	yeah in itself
0:01:51	using a bit we test set
0:01:53	you have a series of cameras
0:01:54	right was cameras
0:01:56	a rate a row
0:01:56	before C
0:01:58	which is a key image
0:02:00	and segment S
0:02:02	sing segment rather the pixel based method
0:02:04	to make it more robust to noise
0:02:06	how we are assuming that
0:02:07	the segments or and saying let
0:02:10	is segment image we match signals across
0:02:13	all of the available images
0:02:15	create a disparity gradient
0:02:17	which we can use to generate
0:02:18	the depth map
0:02:20	we can then to stuff my
0:02:23	and is this to synthesise
0:02:25	you virtual views
0:02:26	at any point
0:02:28	along the whole right image
0:02:36	so if example we take
0:02:38	five
0:02:39	um five input images
0:02:42	you can see that
0:02:43	they quite wide expense
0:02:47	and as big jump change image
0:02:48	however
0:02:50	if we use a method to synthesise for intermediate you
0:02:56	get "'em"
0:02:56	a smooth
0:02:58	transition between the frames
0:02:59	would no major hot five
0:03:04	the scene
0:03:09	the scene itself
0:03:11	the in itself
0:03:12	is non uniformly space
0:03:14	you have objects in clusters
0:03:16	throughout out that
0:03:17	that that of the
0:03:18	the scene
0:03:19	we can represent this
0:03:20	as a histogram
0:03:22	all of these disparities spartans option
0:03:24	we can place
0:03:25	oh
0:03:26	all model
0:03:27	is a a is a laplace rather of and continue system
0:03:31	there many reasons for this
0:03:32	they in time space and the complexity of the
0:03:35	calculation
0:03:38	if we place the late
0:03:39	to minimize the error a
0:03:41	on the disparity and his crap
0:03:44	we can
0:03:44	optimize a position of these less
0:03:47	and
0:03:47	we don't waste space
0:03:50	we don't wasting anything on these regions
0:03:54	yeah
0:03:54	whether no regions of inter
0:03:58	yeah the benefit of this
0:03:59	place
0:04:00	approach
0:04:01	is that we can
0:04:02	make sure that S correspond exactly to
0:04:05	the peaks which correspond to the objects and cells
0:04:08	and
0:04:08	so we don't have any
0:04:10	error on the whole to
0:04:12	this has a new was benefits compared to
0:04:14	another common approach which is that uniformly spaced less
0:04:18	which
0:04:19	one not does not take into account
0:04:21	the scene cell
0:04:25	a second major non spent
0:04:27	it's
0:04:27	however find the debt
0:04:28	my itself
0:04:31	from all initial that map
0:04:33	we have
0:04:34	and signed
0:04:35	that
0:04:36	and also a confidence
0:04:38	in that side
0:04:39	so
0:04:40	we start from close less
0:04:42	because that we note occlusion
0:04:44	we take a segment that
0:04:46	we are compton not assessment
0:04:48	we re it
0:04:50	we cited
0:04:52	to all new depth map
0:04:54	move want to an next
0:04:55	segment in this bus flat
0:04:57	places laugh
0:04:58	i we not competent this
0:04:59	so we sets aside
0:05:01	for like to pair
0:05:03	as you can see
0:05:04	as a meets the next level
0:05:06	behind us layer
0:05:07	we are only using
0:05:09	segments that would call to ten
0:05:11	for the next age of occlusion
0:05:13	so if the next lab
0:05:14	can be included so we use
0:05:17	or now actually measurements for that
0:05:19	classes let
0:05:20	took we it is there with occlusion
0:05:23	giving a some more actually result
0:05:26	a finally as we finished
0:05:28	through a was those less
0:05:30	we
0:05:31	approach
0:05:32	sect we have out
0:05:34	are we can we calculate them
0:05:36	using a complete data
0:05:38	we can do this because
0:05:40	of the occlusion ordering inherent in a like a system
0:05:43	so we only need to fix occlusion
0:05:46	and map
0:05:47	the points at which we hit and you like
0:05:50	which maximise the accuracy
0:05:52	while growing now actual levels was calculation
0:05:57	vol or i
0:05:59	oh method of a as you are within is to take the initial set of images
0:06:04	and remove so
0:06:06	these are not use any anyway through
0:06:08	a process
0:06:09	we synthesise these use
0:06:11	using the remaining images
0:06:14	and then compared against the original based trace
0:06:22	this all based on a tree
0:06:24	we not take advantage of the
0:06:27	"'kay" feast
0:06:28	space less we use uniform that space
0:06:30	we not take into account
0:06:32	the ordering of the
0:06:33	and
0:06:34	less segments as we do the cushioning
0:06:36	and only using one key image
0:06:38	as you can see
0:06:40	as we increase number of layers
0:06:42	and hence the complex you model
0:06:44	the quality of our rendering increase
0:06:46	how it there's a large to be a very good as C
0:06:49	due to the fact that
0:06:50	the lad an is in you to how many as zero
0:06:56	it's such at a point
0:06:58	as as predicted by an atoms minimum stopping criterion
0:07:02	and
0:07:02	the not big guns
0:07:04	however a proposed a present
0:07:07	using
0:07:08	are optimized at position
0:07:10	and not trust based occlusions
0:07:11	and all that sort border refinement
0:07:13	you can improve result
0:07:16	firstly as much
0:07:17	as a less less to to be a very good as in results
0:07:20	because but each layer
0:07:22	and number we optimize let positions
0:07:25	secondly you can see at it class so is a much points
0:07:29	this is because by a the less in the non-uniform way
0:07:32	the minimum something criterion
0:07:34	seems to be
0:07:35	it was be
0:07:36	but i
0:07:38	and
0:07:38	a and this i that the same point
0:07:41	it is using one key image
0:07:43	if we use
0:07:44	and additional key image at the the rent
0:07:46	the sequel
0:07:47	and most two results
0:07:48	we get a two D increase
0:07:50	course
0:07:53	to assess all results
0:07:55	we used
0:07:56	the ground truth provided
0:07:59	and use the ground truth continues map
0:08:01	we
0:08:02	we see this result
0:08:03	as you can see
0:08:05	all
0:08:06	all best approach
0:08:08	which this limit
0:08:10	more more some case
0:08:12	would be
0:08:13	take into images out
0:08:15	rendering all these images
0:08:16	and i've results as before
0:08:19	how in this case
0:08:20	there are fewer images for the initial
0:08:22	it's assignment
0:08:23	and if your images
0:08:25	used to synthesise the out
0:08:27	you was baseline
0:08:28	there's a drop in quality
0:08:30	but it follows a similar path
0:08:31	and again
0:08:32	it that a at the point predicted one min and something
0:08:36	a a is them
0:08:39	as there's an added point
0:08:40	and
0:08:41	there's also an increase in quality
0:08:43	do you the
0:08:44	great i
0:08:45	to receive our assignment
0:08:47	and
0:08:47	do you should the
0:08:49	let on the face
0:08:51	a more a very space last
0:08:53	and all improve for final stuff
0:08:56	and thirdly
0:08:57	if we use to key images
0:08:59	there's a further improve
0:09:03	and
0:09:04	we compare the ground truth
0:09:05	so quite that but is very close
0:09:07	for a for in case
0:09:08	with the ground truth is a she
0:09:11	for the you okay
0:09:13	but everything are um
0:09:14	here's an example out
0:09:16	this is one of the first frames from the input
0:09:19	for a challenging case
0:09:21	the mean is only one point
0:09:22	four
0:09:24	as you can see from the that rigid error map
0:09:26	most there is are on only edges
0:09:28	rather than a middle of a
0:09:30	and
0:09:32	the uh
0:09:33	the P lost
0:09:34	twenty eight point four
0:09:38	moving beyond
0:09:40	the restriction of
0:09:41	a right cameras
0:09:42	we can use and image plane
0:09:45	and in this case
0:09:46	to color images
0:09:51	we can move
0:09:56	a all
0:09:57	and then down image
0:09:58	a the right
0:09:59	within the image
0:10:01	and then
0:10:02	moving in into the image
0:10:06	but
0:10:07	and the right
0:10:12	are of was designed to be able
0:10:14	to more what dimensions
0:10:15	and further research
0:10:18	in conclusion
0:10:19	are are them can synthesise new views
0:10:21	with low computation
0:10:23	but high quality
0:10:25	i like this approach gives the simple but effective
0:10:27	occlusion ordering screen
0:10:29	and a good approximation sing
0:10:32	as can be seen by
0:10:33	all place as the ground for
0:10:37	the P a results
0:10:39	show this
0:10:41	the non-uniform spacing spacing means fewer a a a needed
0:10:44	to achieve the same result
0:10:46	and the minute something criteria
0:10:47	can be relaxed
0:10:50	and caff the selection of second ordering that that map
0:10:53	refinement step
0:10:54	means that we can maximise the efficient be actually
0:10:56	define define a output
0:10:57	with no further calculation
0:11:01	i much
0:11:02	or don't question
0:11:09	questions
0:11:20	i
0:11:24	um i was
0:11:26	was wondering um are you
0:11:28	uh measure components
0:11:30	so or or a it's is based on a number of measures
0:11:33	um
0:11:34	by
0:11:35	looking at
0:11:36	or initial assignment
0:11:38	and and and thing which segments tend to be missus sound
0:11:41	so one of the measure we as the size of the sec
0:11:45	and
0:11:45	so you more seconds of for more likely to be miss signs
0:11:48	also you text within a segment
0:11:51	um
0:11:52	and it's that so foreground objects
0:11:55	a a of team miss assigned
0:11:57	to back an objects and that the level of a possible vision
0:12:00	that all these to give us a
0:12:02	a a a a a small measure how come you know
0:12:04	thing
0:12:15	but me is cool
0:12:16	one question uh maybe B is the exclamation but
0:12:19	and you sure the you should do quit the real graph uh
0:12:23	mention about the ground truth
0:12:25	what was well
0:12:26	sorry and so the
0:12:27	the ground so the middle re test set a provide with the series the ground truth image
0:12:33	um
0:12:34	we so you know i an we generate the already and
0:12:38	um
0:12:38	that that
0:12:39	and then use this to since that yeah
0:12:42	so for the evaluation down the girl
0:12:44	we use the ground truth provide of the test set
0:12:47	to generate a while
0:12:48	and then so for exact the same method is all
0:12:51	the rest the vol are them but we use that that a and set of generated one
0:12:56	so it's a is a measure of how
0:12:58	actually the depth map is
0:13:00	um
0:13:01	rather than
0:13:02	and other
0:13:03	red frequency
0:13:08	a if there is no question
0:13:10	but i think of speaker cooking

ACCURATE NON-ITERATIVE DEPTH LAYER EXTRACTION ALGORITHM FOR IMAGE BASED RENDERING

Stereo and 3-D Processing

Přednášející: James Pearson, Autoři: James Pearson, Pier-Luigi Dragotti, Mike Brookes, Imperial College London, United Kingdom