| 0:00:15 | okay |
|---|
| 0:00:17 | welcome to my presentation i will the speak about the project but it for my |
|---|
| 0:00:22 | masters thesis up in norway in colouration would devour us all and worked on what |
|---|
| 0:00:28 | i |
|---|
| 0:00:30 | but the project was about applying particle swarm optimization has nothing to do with particle |
|---|
| 0:00:37 | the filtering |
|---|
| 0:00:39 | and two human pose tracking |
|---|
| 0:00:43 | so the tracking process will be that you have a three D model of the |
|---|
| 0:00:47 | human and match it's optimally to the observed image in every if a video frame |
|---|
| 0:00:56 | and because this three D model has uh we're thirty parameters that we have to |
|---|
| 0:01:02 | divide the optimization into two stages |
|---|
| 0:01:05 | and in the first stage we only optimize the most important parameters of the model |
|---|
| 0:01:11 | which are the global and position and orientation of the model and then in the |
|---|
| 0:01:16 | second stage we use a global optimisation of the model with all with the arms |
|---|
| 0:01:23 | and legs but we constrain the previously optimized position parameters to a smaller space so |
|---|
| 0:01:30 | to just allow to correct small errors made in the first stage that's what we |
|---|
| 0:01:35 | call the soft partitioning |
|---|
| 0:01:42 | just starting point of the project was the |
|---|
| 0:01:46 | the lee walk dataset that was uh put out by balan et al two thousand |
|---|
| 0:01:51 | and five along with their paper that describe the tracking an algorithm based on the |
|---|
| 0:01:58 | annealed particle filter |
|---|
| 0:02:02 | and |
|---|
| 0:02:03 | this data set that includes a gray scale video from four different use of a |
|---|
| 0:02:08 | single subject walking in a circle and also uh foreground-background segmentation |
|---|
| 0:02:16 | that's used for the fitness function |
|---|
| 0:02:20 | they also published their complete algorithm in matlab and their body model and you also |
|---|
| 0:02:26 | use that in the modified to |
|---|
| 0:02:33 | so the goal will be to track this person with a three D model throughout |
|---|
| 0:02:38 | the whole sequence |
|---|
| 0:02:40 | you see that the track model in colours and also if you look closely the |
|---|
| 0:02:47 | ground truth model in black and white this ground truth model was obtained by balan |
|---|
| 0:02:53 | et al using a commercially available and motion capture system S is used for uh |
|---|
| 0:02:59 | the movies and such |
|---|
| 0:03:06 | the actual problem our algorithm is dealing with this is pose tracking |
|---|
| 0:03:11 | yeah this it relies on the first initialization of the of the of the model |
|---|
| 0:03:16 | in the first frame and then tracks the model and does not to any recognition |
|---|
| 0:03:22 | of actions or something that would be the application of the algorithm for example in |
|---|
| 0:03:27 | surveillance videos where you could classify what people are doing but it's just dealing with |
|---|
| 0:03:34 | the tracking |
|---|
| 0:03:39 | the challenges are uh the main challenges are mostly ambiguities from the three D two |
|---|
| 0:03:44 | D mapping for example if you just look at the silhouette this silhouette and the |
|---|
| 0:03:49 | silhouette and look exactly the same |
|---|
| 0:03:53 | but you can overcome this by using multiple camera views and so we use the |
|---|
| 0:03:57 | four camera views of this dataset and the most important problem is the high dimensionality |
|---|
| 0:04:04 | of the body model |
|---|
| 0:04:07 | we use a body model with the kinematic tree where the over thirty degrees of |
|---|
| 0:04:14 | freedom |
|---|
| 0:04:15 | to model the kinematic structure of human |
|---|
| 0:04:19 | and to model the shape we use a simple model with ten truncated cones |
|---|
| 0:04:26 | it it's very coarse model about the se |
|---|
| 0:04:30 | so yeah approximates human shape |
|---|
| 0:04:37 | so to match the model to the observation in each frame you need to have |
|---|
| 0:04:41 | to define a fitness function and we use a similar one S use that in |
|---|
| 0:04:46 | the two thousand ten publication of signal than black |
|---|
| 0:04:51 | with the two parts first part silhouette fitness very take the foreground-background segmentation and match |
|---|
| 0:04:59 | it to the model um silhouette |
|---|
| 0:05:03 | important here is that the it has to be bidirectional and what is meant by |
|---|
| 0:05:09 | this it has to um look how much of the model is inside the observation |
|---|
| 0:05:16 | and how much of the observation is inside the model |
|---|
| 0:05:19 | because you have to paralyse that the |
|---|
| 0:05:23 | yeah but |
|---|
| 0:05:25 | the let the like in rats is outside the model but the model is almost |
|---|
| 0:05:28 | completely inside the observation |
|---|
| 0:05:32 | so this is important |
|---|
| 0:05:39 | and then the second part of the fitness function is an edge fitness function |
|---|
| 0:05:46 | um humans produce strong edges in the images and so they are easy to get |
|---|
| 0:05:54 | but we divide it's the edge fitness function and for the two steps of our |
|---|
| 0:05:59 | optimisation in the first that we just look at the of the position that the |
|---|
| 0:06:05 | course position of the person without looking at the arms and legs and so we |
|---|
| 0:06:08 | only use torso edges |
|---|
| 0:06:11 | and in the second optimisation stage we look at all that just with that lacks |
|---|
| 0:06:15 | a little limbs |
|---|
| 0:06:24 | this is just an overview of the fitness computation |
|---|
| 0:06:29 | you gets the observed image the projected candidate pose |
|---|
| 0:06:34 | and then you produce the silhouettes and the edges of both and we additionally mask |
|---|
| 0:06:41 | the edge picture with the edge image with the silhouette to get rid of spores |
|---|
| 0:06:48 | edges in the background |
|---|
| 0:06:51 | and then we match both fitness also images |
|---|
| 0:06:55 | and the silhouette fitness and the edge fitness are normalized separately and summed up to |
|---|
| 0:07:00 | form a final fitness value that quantified how well a candidate pose matches an image |
|---|
| 0:07:11 | in comes the optimization with soft partitioning as a set |
|---|
| 0:07:16 | first in image data you have the initialization that is the previous the model from |
|---|
| 0:07:24 | the previous frame |
|---|
| 0:07:26 | and then you get the image here you see the foreground-background segmentation of the next |
|---|
| 0:07:32 | frame |
|---|
| 0:07:33 | and in it the result of the first optimisation stage is you shift the model |
|---|
| 0:07:39 | without changing the arms or legs you shifted to the new position of the person |
|---|
| 0:07:46 | and in the second stage in image see we adapt the position of arms and |
|---|
| 0:07:51 | legs in a global optimisation |
|---|
| 0:07:55 | but all parameters are allowed to change even the position parameters have been optimized previously |
|---|
| 0:08:01 | but constraint to narrower range |
|---|
| 0:08:11 | this is a to illustrate and to contrast the soft partitioning concept here will be |
|---|
| 0:08:17 | a heart partitioning with two variables |
|---|
| 0:08:21 | in two steps so in the first step you optimise |
|---|
| 0:08:25 | the first |
|---|
| 0:08:28 | parameter X one keep it fixed and in the second stage optimize parameter to |
|---|
| 0:08:35 | and you see the optimum would be here you can't get there because you are |
|---|
| 0:08:39 | not allowed to correct errors made in the first stage |
|---|
| 0:08:43 | so we allow small variations |
|---|
| 0:08:47 | of the previously optimized parameter |
|---|
| 0:08:50 | to open up the search space little and correct errors we made so that we |
|---|
| 0:08:57 | saw in experiments that if you don't do that dance you can also see it |
|---|
| 0:09:01 | in the in the literature that's you uh and get thrift in your model if |
|---|
| 0:09:08 | you make a heart partitioning in such a way |
|---|
| 0:09:16 | then to evaluate our algorithm we use the standard error measure uh proposed by balan |
|---|
| 0:09:23 | et al |
|---|
| 0:09:25 | that is just the mean distance of fifteen marker joints |
|---|
| 0:09:29 | the between the ground truth model and the track model |
|---|
| 0:09:38 | in this prophecy the results of uh five tracking runs and the mean error for |
|---|
| 0:09:45 | every frame |
|---|
| 0:09:47 | for all our algorithm in black and the apf in green apf is the annealed |
|---|
| 0:09:54 | particle filter that was implemented by balan et al and proposed as a benchmark algorithm |
|---|
| 0:10:01 | both algorithms use the same amount the fitness a evaluation but this that the time |
|---|
| 0:10:07 | consuming part of the algorithm and exactly the same fitness functions |
|---|
| 0:10:13 | and you can see this that our algorithm performs uh you better than apf |
|---|
| 0:10:20 | the |
|---|
| 0:10:22 | this peak |
|---|
| 0:10:24 | is cost you can see it in the beta later by lost like uh what's |
|---|
| 0:10:29 | dislike and theory acquired in further a frames of the video so it's uh |
|---|
| 0:10:35 | quite robust |
|---|
| 0:10:39 | this is the video to this to the previous graph shows one tracking wrong |
|---|
| 0:10:49 | again you see the ground truth and black and the tracking results in colour |
|---|
| 0:10:54 | and it loses uh and the arm frequently and the lack frequently but this is |
|---|
| 0:11:00 | that's twenty frames per second and the original dataset a sixty frames so it's easier |
|---|
| 0:11:07 | to track at higher frame rates because of course you have a smaller distances between |
|---|
| 0:11:14 | the your poses |
|---|
| 0:11:17 | between the frames |
|---|
| 0:11:19 | so |
|---|
| 0:11:21 | trucks better at sixty frames |
|---|
| 0:11:35 | so in conclusion um particle swarm optimization can be applied successfully to pose tracking and |
|---|
| 0:11:43 | B |
|---|
| 0:11:45 | it can even perform better than the annealed particle filter without the old uh the |
|---|
| 0:11:52 | probably probabilistic uh |
|---|
| 0:11:56 | overhead |
|---|
| 0:11:58 | and you have to do something to overcome the high dimensionality problem of such a |
|---|
| 0:12:04 | body model and the soft partitioning approach them works |
|---|
| 0:12:10 | and in our eyes works better than the heart partitioning because heart partitioning approaches and |
|---|
| 0:12:17 | imply illustration |
|---|
| 0:12:21 | and of course the body model for future approaches uh should be a little more |
|---|
| 0:12:25 | detail because for example you count model uh aren't twists and such and this uh |
|---|
| 0:12:32 | give some problems |
|---|
| 0:12:36 | so i wanna thank uh and university for the funding and the arousal and book |
|---|
| 0:12:42 | sampling for the good colouration and their help |
|---|
| 0:12:48 | thank you for your attention and uh all will be happy to answer questions |
|---|
| 0:13:07 | at the model has constraints so only natural bindings of the joints are allowed |
|---|
| 0:13:15 | yep |
|---|
| 0:13:25 | yes |
|---|
| 0:13:34 | yes |
|---|
| 0:13:40 | uh that's uh an empirical value we just the |
|---|
| 0:13:44 | allow only one kinds of the variation for uh in the second stage for the |
|---|
| 0:13:48 | first problem |
|---|
| 0:13:54 | of course the optimal setting will be different probably about the i mean it's a |
|---|
| 0:13:59 | general principle that you can get a coarse alignment of the body in the first |
|---|
| 0:14:04 | step and then |
|---|
| 0:14:06 | the just the arm positioning in the second step |
|---|
| 0:14:19 | uh just ground truth model so we didn't think about initialization |
|---|
| 0:14:26 | you could use any human detector and try to initialize it with it |
|---|