Speech Transcript - Particle Swarm Optimization with Soft Search Space Partitioning for Video-Based Markerless Pose Tracking

0:00:15	okay
0:00:17	welcome to my presentation i will the speak about the project but it for my
0:00:22	masters thesis up in norway in colouration would devour us all and worked on what
0:00:28	i
0:00:30	but the project was about applying particle swarm optimization has nothing to do with particle
0:00:37	the filtering
0:00:39	and two human pose tracking
0:00:43	so the tracking process will be that you have a three D model of the
0:00:47	human and match it's optimally to the observed image in every if a video frame
0:00:56	and because this three D model has uh we're thirty parameters that we have to
0:01:02	divide the optimization into two stages
0:01:05	and in the first stage we only optimize the most important parameters of the model
0:01:11	which are the global and position and orientation of the model and then in the
0:01:16	second stage we use a global optimisation of the model with all with the arms
0:01:23	and legs but we constrain the previously optimized position parameters to a smaller space so
0:01:30	to just allow to correct small errors made in the first stage that's what we
0:01:35	call the soft partitioning
0:01:42	just starting point of the project was the
0:01:46	the lee walk dataset that was uh put out by balan et al two thousand
0:01:51	and five along with their paper that describe the tracking an algorithm based on the
0:01:58	annealed particle filter
0:02:02	and
0:02:03	this data set that includes a gray scale video from four different use of a
0:02:08	single subject walking in a circle and also uh foreground-background segmentation
0:02:16	that's used for the fitness function
0:02:20	they also published their complete algorithm in matlab and their body model and you also
0:02:26	use that in the modified to
0:02:33	so the goal will be to track this person with a three D model throughout
0:02:38	the whole sequence
0:02:40	you see that the track model in colours and also if you look closely the
0:02:47	ground truth model in black and white this ground truth model was obtained by balan
0:02:53	et al using a commercially available and motion capture system S is used for uh
0:02:59	the movies and such
0:03:06	the actual problem our algorithm is dealing with this is pose tracking
0:03:11	yeah this it relies on the first initialization of the of the of the model
0:03:16	in the first frame and then tracks the model and does not to any recognition
0:03:22	of actions or something that would be the application of the algorithm for example in
0:03:27	surveillance videos where you could classify what people are doing but it's just dealing with
0:03:34	the tracking
0:03:39	the challenges are uh the main challenges are mostly ambiguities from the three D two
0:03:44	D mapping for example if you just look at the silhouette this silhouette and the
0:03:49	silhouette and look exactly the same
0:03:53	but you can overcome this by using multiple camera views and so we use the
0:03:57	four camera views of this dataset and the most important problem is the high dimensionality
0:04:04	of the body model
0:04:07	we use a body model with the kinematic tree where the over thirty degrees of
0:04:14	freedom
0:04:15	to model the kinematic structure of human
0:04:19	and to model the shape we use a simple model with ten truncated cones
0:04:26	it it's very coarse model about the se
0:04:30	so yeah approximates human shape
0:04:37	so to match the model to the observation in each frame you need to have
0:04:41	to define a fitness function and we use a similar one S use that in
0:04:46	the two thousand ten publication of signal than black
0:04:51	with the two parts first part silhouette fitness very take the foreground-background segmentation and match
0:04:59	it to the model um silhouette
0:05:03	important here is that the it has to be bidirectional and what is meant by
0:05:09	this it has to um look how much of the model is inside the observation
0:05:16	and how much of the observation is inside the model
0:05:19	because you have to paralyse that the
0:05:23	yeah but
0:05:25	the let the like in rats is outside the model but the model is almost
0:05:28	completely inside the observation
0:05:32	so this is important
0:05:39	and then the second part of the fitness function is an edge fitness function
0:05:46	um humans produce strong edges in the images and so they are easy to get
0:05:54	but we divide it's the edge fitness function and for the two steps of our
0:05:59	optimisation in the first that we just look at the of the position that the
0:06:05	course position of the person without looking at the arms and legs and so we
0:06:08	only use torso edges
0:06:11	and in the second optimisation stage we look at all that just with that lacks
0:06:15	a little limbs
0:06:24	this is just an overview of the fitness computation
0:06:29	you gets the observed image the projected candidate pose
0:06:34	and then you produce the silhouettes and the edges of both and we additionally mask
0:06:41	the edge picture with the edge image with the silhouette to get rid of spores
0:06:48	edges in the background
0:06:51	and then we match both fitness also images
0:06:55	and the silhouette fitness and the edge fitness are normalized separately and summed up to
0:07:00	form a final fitness value that quantified how well a candidate pose matches an image
0:07:11	in comes the optimization with soft partitioning as a set
0:07:16	first in image data you have the initialization that is the previous the model from
0:07:24	the previous frame
0:07:26	and then you get the image here you see the foreground-background segmentation of the next
0:07:32	frame
0:07:33	and in it the result of the first optimisation stage is you shift the model
0:07:39	without changing the arms or legs you shifted to the new position of the person
0:07:46	and in the second stage in image see we adapt the position of arms and
0:07:51	legs in a global optimisation
0:07:55	but all parameters are allowed to change even the position parameters have been optimized previously
0:08:01	but constraint to narrower range
0:08:11	this is a to illustrate and to contrast the soft partitioning concept here will be
0:08:17	a heart partitioning with two variables
0:08:21	in two steps so in the first step you optimise
0:08:25	the first
0:08:28	parameter X one keep it fixed and in the second stage optimize parameter to
0:08:35	and you see the optimum would be here you can't get there because you are
0:08:39	not allowed to correct errors made in the first stage
0:08:43	so we allow small variations
0:08:47	of the previously optimized parameter
0:08:50	to open up the search space little and correct errors we made so that we
0:08:57	saw in experiments that if you don't do that dance you can also see it
0:09:01	in the in the literature that's you uh and get thrift in your model if
0:09:08	you make a heart partitioning in such a way
0:09:16	then to evaluate our algorithm we use the standard error measure uh proposed by balan
0:09:23	et al
0:09:25	that is just the mean distance of fifteen marker joints
0:09:29	the between the ground truth model and the track model
0:09:38	in this prophecy the results of uh five tracking runs and the mean error for
0:09:45	every frame
0:09:47	for all our algorithm in black and the apf in green apf is the annealed
0:09:54	particle filter that was implemented by balan et al and proposed as a benchmark algorithm
0:10:01	both algorithms use the same amount the fitness a evaluation but this that the time
0:10:07	consuming part of the algorithm and exactly the same fitness functions
0:10:13	and you can see this that our algorithm performs uh you better than apf
0:10:20	the
0:10:22	this peak
0:10:24	is cost you can see it in the beta later by lost like uh what's
0:10:29	dislike and theory acquired in further a frames of the video so it's uh
0:10:35	quite robust
0:10:39	this is the video to this to the previous graph shows one tracking wrong
0:10:49	again you see the ground truth and black and the tracking results in colour
0:10:54	and it loses uh and the arm frequently and the lack frequently but this is
0:11:00	that's twenty frames per second and the original dataset a sixty frames so it's easier
0:11:07	to track at higher frame rates because of course you have a smaller distances between
0:11:14	the your poses
0:11:17	between the frames
0:11:19	so
0:11:21	trucks better at sixty frames
0:11:35	so in conclusion um particle swarm optimization can be applied successfully to pose tracking and
0:11:43	B
0:11:45	it can even perform better than the annealed particle filter without the old uh the
0:11:52	probably probabilistic uh
0:11:56	overhead
0:11:58	and you have to do something to overcome the high dimensionality problem of such a
0:12:04	body model and the soft partitioning approach them works
0:12:10	and in our eyes works better than the heart partitioning because heart partitioning approaches and
0:12:17	imply illustration
0:12:21	and of course the body model for future approaches uh should be a little more
0:12:25	detail because for example you count model uh aren't twists and such and this uh
0:12:32	give some problems
0:12:36	so i wanna thank uh and university for the funding and the arousal and book
0:12:42	sampling for the good colouration and their help
0:12:48	thank you for your attention and uh all will be happy to answer questions
0:13:07	at the model has constraints so only natural bindings of the joints are allowed
0:13:15	yep
0:13:25	yes
0:13:34	yes
0:13:40	uh that's uh an empirical value we just the
0:13:44	allow only one kinds of the variation for uh in the second stage for the
0:13:48	first problem
0:13:54	of course the optimal setting will be different probably about the i mean it's a
0:13:59	general principle that you can get a coarse alignment of the body in the first
0:14:04	step and then
0:14:06	the just the arm positioning in the second step
0:14:19	uh just ground truth model so we didn't think about initialization
0:14:26	you could use any human detector and try to initialize it with it

Particle Swarm Optimization with Soft Search Space Partitioning for Video-Based Markerless Pose Tracking

Object Tracking and Identification

Patrick Fleischmann, Ivar Austvoll, Bogdan Kwolek