0:00:17welcome to my presentation i will the speak about the project but it for my
0:00:22masters thesis up in norway in colouration would devour us all and worked on what
0:00:30but the project was about applying particle swarm optimization has nothing to do with particle
0:00:37the filtering
0:00:39and two human pose tracking
0:00:43so the tracking process will be that you have a three D model of the
0:00:47human and match it's optimally to the observed image in every if a video frame
0:00:56and because this three D model has uh we're thirty parameters that we have to
0:01:02divide the optimization into two stages
0:01:05and in the first stage we only optimize the most important parameters of the model
0:01:11which are the global and position and orientation of the model and then in the
0:01:16second stage we use a global optimisation of the model with all with the arms
0:01:23and legs but we constrain the previously optimized position parameters to a smaller space so
0:01:30to just allow to correct small errors made in the first stage that's what we
0:01:35call the soft partitioning
0:01:42just starting point of the project was the
0:01:46the lee walk dataset that was uh put out by balan et al two thousand
0:01:51and five along with their paper that describe the tracking an algorithm based on the
0:01:58annealed particle filter
0:02:03this data set that includes a gray scale video from four different use of a
0:02:08single subject walking in a circle and also uh foreground-background segmentation
0:02:16that's used for the fitness function
0:02:20they also published their complete algorithm in matlab and their body model and you also
0:02:26use that in the modified to
0:02:33so the goal will be to track this person with a three D model throughout
0:02:38the whole sequence
0:02:40you see that the track model in colours and also if you look closely the
0:02:47ground truth model in black and white this ground truth model was obtained by balan
0:02:53et al using a commercially available and motion capture system S is used for uh
0:02:59the movies and such
0:03:06the actual problem our algorithm is dealing with this is pose tracking
0:03:11yeah this it relies on the first initialization of the of the of the model
0:03:16in the first frame and then tracks the model and does not to any recognition
0:03:22of actions or something that would be the application of the algorithm for example in
0:03:27surveillance videos where you could classify what people are doing but it's just dealing with
0:03:34the tracking
0:03:39the challenges are uh the main challenges are mostly ambiguities from the three D two
0:03:44D mapping for example if you just look at the silhouette this silhouette and the
0:03:49silhouette and look exactly the same
0:03:53but you can overcome this by using multiple camera views and so we use the
0:03:57four camera views of this dataset and the most important problem is the high dimensionality
0:04:04of the body model
0:04:07we use a body model with the kinematic tree where the over thirty degrees of
0:04:15to model the kinematic structure of human
0:04:19and to model the shape we use a simple model with ten truncated cones
0:04:26it it's very coarse model about the se
0:04:30so yeah approximates human shape
0:04:37so to match the model to the observation in each frame you need to have
0:04:41to define a fitness function and we use a similar one S use that in
0:04:46the two thousand ten publication of signal than black
0:04:51with the two parts first part silhouette fitness very take the foreground-background segmentation and match
0:04:59it to the model um silhouette
0:05:03important here is that the it has to be bidirectional and what is meant by
0:05:09this it has to um look how much of the model is inside the observation
0:05:16and how much of the observation is inside the model
0:05:19because you have to paralyse that the
0:05:23yeah but
0:05:25the let the like in rats is outside the model but the model is almost
0:05:28completely inside the observation
0:05:32so this is important
0:05:39and then the second part of the fitness function is an edge fitness function
0:05:46um humans produce strong edges in the images and so they are easy to get
0:05:54but we divide it's the edge fitness function and for the two steps of our
0:05:59optimisation in the first that we just look at the of the position that the
0:06:05course position of the person without looking at the arms and legs and so we
0:06:08only use torso edges
0:06:11and in the second optimisation stage we look at all that just with that lacks
0:06:15a little limbs
0:06:24this is just an overview of the fitness computation
0:06:29you gets the observed image the projected candidate pose
0:06:34and then you produce the silhouettes and the edges of both and we additionally mask
0:06:41the edge picture with the edge image with the silhouette to get rid of spores
0:06:48edges in the background
0:06:51and then we match both fitness also images
0:06:55and the silhouette fitness and the edge fitness are normalized separately and summed up to
0:07:00form a final fitness value that quantified how well a candidate pose matches an image
0:07:11in comes the optimization with soft partitioning as a set
0:07:16first in image data you have the initialization that is the previous the model from
0:07:24the previous frame
0:07:26and then you get the image here you see the foreground-background segmentation of the next
0:07:33and in it the result of the first optimisation stage is you shift the model
0:07:39without changing the arms or legs you shifted to the new position of the person
0:07:46and in the second stage in image see we adapt the position of arms and
0:07:51legs in a global optimisation
0:07:55but all parameters are allowed to change even the position parameters have been optimized previously
0:08:01but constraint to narrower range
0:08:11this is a to illustrate and to contrast the soft partitioning concept here will be
0:08:17a heart partitioning with two variables
0:08:21in two steps so in the first step you optimise
0:08:25the first
0:08:28parameter X one keep it fixed and in the second stage optimize parameter to
0:08:35and you see the optimum would be here you can't get there because you are
0:08:39not allowed to correct errors made in the first stage
0:08:43so we allow small variations
0:08:47of the previously optimized parameter
0:08:50to open up the search space little and correct errors we made so that we
0:08:57saw in experiments that if you don't do that dance you can also see it
0:09:01in the in the literature that's you uh and get thrift in your model if
0:09:08you make a heart partitioning in such a way
0:09:16then to evaluate our algorithm we use the standard error measure uh proposed by balan
0:09:23et al
0:09:25that is just the mean distance of fifteen marker joints
0:09:29the between the ground truth model and the track model
0:09:38in this prophecy the results of uh five tracking runs and the mean error for
0:09:45every frame
0:09:47for all our algorithm in black and the apf in green apf is the annealed
0:09:54particle filter that was implemented by balan et al and proposed as a benchmark algorithm
0:10:01both algorithms use the same amount the fitness a evaluation but this that the time
0:10:07consuming part of the algorithm and exactly the same fitness functions
0:10:13and you can see this that our algorithm performs uh you better than apf
0:10:22this peak
0:10:24is cost you can see it in the beta later by lost like uh what's
0:10:29dislike and theory acquired in further a frames of the video so it's uh
0:10:35quite robust
0:10:39this is the video to this to the previous graph shows one tracking wrong
0:10:49again you see the ground truth and black and the tracking results in colour
0:10:54and it loses uh and the arm frequently and the lack frequently but this is
0:11:00that's twenty frames per second and the original dataset a sixty frames so it's easier
0:11:07to track at higher frame rates because of course you have a smaller distances between
0:11:14the your poses
0:11:17between the frames
0:11:21trucks better at sixty frames
0:11:35so in conclusion um particle swarm optimization can be applied successfully to pose tracking and
0:11:45it can even perform better than the annealed particle filter without the old uh the
0:11:52probably probabilistic uh
0:11:58and you have to do something to overcome the high dimensionality problem of such a
0:12:04body model and the soft partitioning approach them works
0:12:10and in our eyes works better than the heart partitioning because heart partitioning approaches and
0:12:17imply illustration
0:12:21and of course the body model for future approaches uh should be a little more
0:12:25detail because for example you count model uh aren't twists and such and this uh
0:12:32give some problems
0:12:36so i wanna thank uh and university for the funding and the arousal and book
0:12:42sampling for the good colouration and their help
0:12:48thank you for your attention and uh all will be happy to answer questions
0:13:07at the model has constraints so only natural bindings of the joints are allowed
0:13:40uh that's uh an empirical value we just the
0:13:44allow only one kinds of the variation for uh in the second stage for the
0:13:48first problem
0:13:54of course the optimal setting will be different probably about the i mean it's a
0:13:59general principle that you can get a coarse alignment of the body in the first
0:14:04step and then
0:14:06the just the arm positioning in the second step
0:14:19uh just ground truth model so we didn't think about initialization
0:14:26you could use any human detector and try to initialize it with it