0:00:15i
0:00:15with and
0:00:16present them at that is and you know
0:00:19i
0:00:20to detect you mouth
0:00:21just
0:00:22so i
0:00:23of my presentation is just for that
0:00:26first i read explain why from our point of view the detection of humans
0:00:29a just streams
0:00:31is a binary now see that classification problem
0:00:34and then we will design a new robust probably stay
0:00:37it's so this approach
0:00:38classified
0:00:38see wet
0:00:40and i and i will present the that that's that's we used assess S or method
0:00:44the results
0:00:45and i really give a short conclusion
0:00:49uh uh so that's a by the definition of the problem of detection
0:00:53a few meant to be just straight
0:00:56so that taking you humans in video streams is useful for many applications
0:01:00such as a video surveillance
0:01:01but the at we just send an is
0:01:04oh to ten man
0:01:06are
0:01:06the goal is to detect a only you minds and the sink guess
0:01:10so the corner stone of such a
0:01:12an application
0:01:13is the ability to classify the observation
0:01:17in two class
0:01:18that are your man and you my
0:01:22a first approach to detect humans has been proposed for still images
0:01:27however such an approach has a of drawbacks
0:01:30it's based on the appearance
0:01:32and usually the appearance
0:01:33a few months
0:01:34is a pretty table because uh
0:01:36colours and textures
0:01:38a uh
0:01:40uh
0:01:41very
0:01:43also a lot of detection windows those have to be considered
0:01:46a image because we have to look for in months of the lot of positions and that a lot of
0:01:50states
0:01:52so um
0:01:53the consequence is that it is difficult to obtain a low false alarm rate per image
0:01:59a better press and she's in using a background subtraction and right
0:02:04this ones
0:02:05uh i
0:02:06struck the C of the users
0:02:07and the a moving objects in the scene
0:02:10thus
0:02:11uh we take advantage of that on prior information present in the you video streams
0:02:17and and D um
0:02:20the um decision is based on geometric information
0:02:24also on the a few close have to be considered
0:02:26so it's possible to obtain a lower false alarm rate
0:02:30but match and then with the approach based on
0:02:33the image
0:02:35so now we will design
0:02:37a probabilistic because of this approach
0:02:39a classify C where
0:02:43when we in
0:02:44i design our method we took into account
0:02:46the ours
0:02:47because we want to
0:02:48to get a robust method
0:02:51so when a background subtraction is followed by a a connected components and i is
0:02:55see let's make present defect
0:02:58first
0:02:59then may present
0:03:00um
0:03:01noise controls or or you do not
0:03:04but
0:03:05also um
0:03:07the sequence of
0:03:09a several users or or or a moving object
0:03:12could be match
0:03:13and last but not least um
0:03:15carried object and shot those could also be detected in the foreground
0:03:19so we need to have a a robust
0:03:22uh a description technique
0:03:25oh what that it the option a description technique exist
0:03:28and are may need to treat a yeah but can be used to choose the um
0:03:33description technique most adapted
0:03:35to uh what we need
0:03:38so the first criterion
0:03:40uh is related to the use of
0:03:43uh uh the rip points or to entirely or of the seal
0:03:48but you are you can that this that
0:03:50a a a a a and every image is the noise a fix the controls
0:03:53so a region based methods
0:03:55is
0:03:56less sensitive to noise and is preferable
0:03:59no other criteria is
0:04:01read it it to um
0:04:03the use of a bottle or look at the uh
0:04:06attribute
0:04:08with is a global that's
0:04:09the attribute
0:04:11um are we need it to do a whole shape
0:04:13that for
0:04:14um
0:04:16and the
0:04:17what the attributes are a to it
0:04:19by
0:04:20the presence of defects
0:04:21in the ceiling
0:04:22look at it that's
0:04:23speech shapes into smaller components
0:04:26with the hope to limit the influence of the to a few components
0:04:30so
0:04:31uh a in our case a region based the catfish description made that is prefer
0:04:39or also with they should and mit that is the simplest region based that at the spectrum it that one
0:04:44can image a it's the set of all pixels include in the show
0:04:47but the question is how can you justify such a set
0:04:51unfortunately there it doesn't exist any motion on them at the same for this task
0:04:55so we will do this by or
0:05:02in our our at each pixel that plays the role of next
0:05:06in the first stage
0:05:07each expert
0:05:08this site
0:05:09if it i
0:05:10the
0:05:11if
0:05:11is already to peaks set is part of a human silhouette or or not
0:05:15and we also assume that it gives the probably for it's this decision to be a
0:05:21um um a matter of
0:05:22the experts can be implemented by machine on methods
0:05:27in the second stage yep in is given by the expert
0:05:30our males
0:05:31by by weighted it what image is
0:05:34and the weight given to an expert
0:05:35depends on the probability for this region to be correct
0:05:39indeed
0:05:40intuitively T we want that un
0:05:42a confidant expect
0:05:44has an important weight in the few not decision
0:05:49so here is a a a a small example at to to understand them at that
0:05:53so
0:05:54it's
0:05:54X
0:05:55is a fix a and they produce
0:05:57is the probability for a it's to
0:06:00be part of a you human seem
0:06:02that's the the information given by the expert
0:06:06so
0:06:06in this example are six six express are used to classify the way
0:06:11each of them takes a look around here and based on the subset summation
0:06:16it gives the probability for a it's to be part of a human silhouette
0:06:21and then all this information are mapped into the few not decision that is
0:06:26that
0:06:27see what this you
0:06:31so to implement the experts we have used a or full uh machine learning need
0:06:37and
0:06:37except trees
0:06:40oh it's a it so that doesn't require you to optimize and parameter
0:06:45and i so that of all
0:06:46intrinsic a uh over fitting
0:06:50it's a like
0:06:52a a of decision trees
0:06:54and in our data each tree use of votes for one class you man on a human
0:06:59so we denote know it that i is the proportion of trees voting for the class you man
0:07:04and chris and then my there's
0:07:06respectively that don't ten amount of you man and then the man errors
0:07:09in the long set
0:07:12and
0:07:13we propose the following estimator for the probability
0:07:17uh of
0:07:19the
0:07:19set
0:07:20to be should from a humans you with
0:07:23so um
0:07:24one the learning set
0:07:25balance
0:07:27the probability uh is
0:07:29approximately equal to the proportion of trees voting for the class and
0:07:34however when the that long that the base is not but i and
0:07:38there is a yes in the decision you can but the trees
0:07:41and this is yeah as to be can so that's what we do in our probably the estimate
0:07:48so
0:07:49once we have a probability uh for each peak that
0:07:53we can compute
0:07:55the decision and they can buy the than D X fair
0:07:58and the probability for a this shouldn't to be can right
0:08:01using base rule
0:08:04also um
0:08:06i say then yeah yeah
0:08:08uh we use of a weighting
0:08:11a weight it um voting rule
0:08:14to give the class
0:08:15to the this right
0:08:17and that's
0:08:18the question number four
0:08:20yeah W value is the weight given
0:08:22to D X
0:08:26so i nine we present
0:08:28the a that that's it's we use to assess or method
0:08:31the results
0:08:32and give
0:08:33i short computer
0:08:35both our make set and testing set
0:08:37a contents you man and then you meant see what
0:08:40then in wrist i to one hundred by one hundred pixels
0:08:44this means that are we make that is
0:08:47scanning valiant
0:08:49and also also that the that can be used with a low resolution images
0:08:54also note that
0:08:56a long and it is not but i
0:08:58but this is not the problem is our probably to estimate of can as the B yes
0:09:03to by
0:09:05so are at the result
0:09:07the images is shows the probability a plus compute it in each peak set
0:09:13um
0:09:14these are probability maps
0:09:16oh what extent response
0:09:18to a probability of one
0:09:20where whereas of the excel corresponds to a probability of zero
0:09:25as you can see
0:09:26you meant see what i'm right to
0:09:28and that's means that our method that works very well
0:09:33uh_huh hmmm
0:09:36one
0:09:37we have computed the probability matrix
0:09:39in X three
0:09:41we also have to assign a weight
0:09:44to each said
0:09:46in fact we try to three different weighting strategies
0:09:49one of them being too large are automatically the weighting function
0:09:54and
0:09:54oh these strategies that's to see our a
0:10:00and then we have a a correct classification rate or one ninety percent for most you man and then you
0:10:05meant to where
0:10:07however for this is something are starting point
0:10:09because
0:10:11uh
0:10:11we did not yet try
0:10:13to optimize
0:10:14the set of attributes
0:10:16used to describe a set
0:10:18and also to describe it and we have to define a neighborhood
0:10:22and we don't try to optimize the neighborhood shape and the neighbourhood size
0:10:27so i believe that but the results
0:10:29be also are it thing with our method
0:10:33so in country and we have proposed a new system for the detection of humans
0:10:38well used it for video streams
0:10:40our approach has been designed to rely on geometric information
0:10:45and to be a robust to not
0:10:47so in a first that we apply a background subtraction noise
0:10:51but like sequence
0:10:53of
0:10:53best sounds and moving objects in the scene
0:10:56then a probabilistic information
0:10:58is computed for each of set in the foreground
0:11:03and finale
0:11:04is information is used to decide was of the sit where is that a for you or not
0:11:08there is show that our approach is promising for the detection of humans months industry
0:11:14but finding the optimal neighborhood used for addition
0:11:17a for the description of a set is left for future work
0:11:20thank you
0:11:27thank you sebastian
0:11:29any question
0:11:38uh a what about a comparison we the whole days
0:11:42uh the best and detection
0:11:45but is so that
0:11:47my first play
0:11:49with a but is
0:11:50you have really uh
0:11:52are you john more of
0:11:54uh detection windows
0:11:55we be considered
0:11:57um
0:11:58and this is
0:11:59about
0:12:00twelve thousand images
0:12:02uh i mean those per image
0:12:04um there for the um
0:12:06a force a lower rate
0:12:08should be multiplied by
0:12:10oh what i on
0:12:12to obtain the false alarm rate
0:12:14per image
0:12:15so it gives a really a um high for time rate right image
0:12:20uh also so there are uh techniques to
0:12:23keep only um a amount of uh response in the images
0:12:27but at least you have a false
0:12:30but detection or image
0:12:32this is not
0:12:33uh
0:12:34acceptable a table for vacation
0:12:36such as video a
0:12:39or but you can apply your or uh uh uh to the whole detect or
0:12:43a descriptor is going be
0:12:45on the uh um
0:12:47on the movie mask
0:12:49on the moving object
0:12:50okay but
0:12:51in this case
0:12:53yeah O and the um
0:12:55oh okay is on computed using colours
0:12:59and
0:12:59um
0:13:00the utterance of few months in videos
0:13:03is
0:13:04and predictable of you can have
0:13:06but of different colours and textures
0:13:08and from our point of view that's preferable to use on the uh geometric information
0:13:14and the temporal information that we have
0:13:16in the video streams
0:13:17i does as
0:13:18to do this
0:13:20that's why we have
0:13:21chosen
0:13:26yes but
0:13:32really
0:13:33a funny question but
0:13:34uh uh so uh
0:13:36i mean based on the shape but uh you you said you want to distinguish between humans and the rest
0:13:41so as so when you put next to these market
0:13:44i mean
0:13:46have you have you to this view my
0:13:48because uh uh for me like
0:13:50something of that has like uh
0:13:53of the same shape
0:13:55will be detected as you right
0:13:57right
0:13:58okay so in the market will be human
0:14:02um um
0:14:03in fact you can uh a longer
0:14:06was
0:14:06uh
0:14:07one keys in the negative
0:14:09a set
0:14:10so that set of nine you man us let's
0:14:13and probably if you don't have a uh
0:14:16two nine two
0:14:17too much nice in i images
0:14:19this will work
0:14:20but
0:14:20uh in real applications
0:14:22they are nice
0:14:23and therefore with to can us small images
0:14:26one hundred probably one hundred
0:14:28that's why guessing uh you're are right on the you will be detected this you my
0:14:33but with a a synthetic images
0:14:36without noise
0:14:37this is possible to distinguish
0:14:39well then you have
0:14:40problems with
0:14:42the close
0:14:43and you you so on
0:14:44okay
0:14:45thank
0:14:47other question
0:14:50you got any assumptions on how the cameras shall be
0:14:54well compared to
0:14:56people
0:14:57um
0:14:58yes indeed
0:15:00when you but you're running set
0:15:03you should um
0:15:05a it with uh
0:15:07see do taken from the same point of view
0:15:10do a real application
0:15:12for example if
0:15:13and the
0:15:13a real application you time right
0:15:15is
0:15:16a above the person
0:15:17and you should
0:15:18a place in your right
0:15:20so
0:15:20see let's they can
0:15:21the
0:15:22on the same
0:15:24but
0:15:25um
0:15:28this
0:15:29in practice
0:15:30not the problems
0:15:31the you meant see words in the long set can be generated with and can be with a a a
0:15:36a a human about to
0:15:38and uh for changing the point of view
0:15:40only a few minutes
0:15:42to compute and
0:15:44and uh
0:15:45related to the first question
0:15:47a got a sense of all these were form compared to
0:15:51uh a train cascade of classifiers
0:15:54been approaches that look at
0:15:56you humans
0:15:57a like humans or
0:15:59you humans
0:15:59a set of parts of the band
0:16:01using a cascade classifier
0:16:04and that it takes a long time to plane but it's not source will we
0:16:07at this point we didn't compare
0:16:10because uh we
0:16:13we hope to have a better results with or mental
0:16:16and also our method as as so um
0:16:20um positive points
0:16:21for example you have in the formation computed in pixel
0:16:25which means that for example if i
0:16:28or the get in my hand
0:16:30it will be that it it as being in the four a i'd the background subtraction
0:16:34but
0:16:34the probability maps
0:16:36right i
0:16:37to you raise the guitar if i one for example or to do was recovery of we also
0:16:42a like this
0:16:44so
0:16:46i think or or middle well
0:16:48uh
0:16:49steve
0:16:53well last question from you again
0:16:59mentioned that the this can be used for video sequence
0:17:01have you thought about how we use the temporal information
0:17:04because a you and that is
0:17:05a frame by frame
0:17:06yes yes that them brought information is used
0:17:09uh in fig by the background subtraction
0:17:12okay i
0:17:13but my question was to think that it
0:17:15the uh
0:17:16could use the temporal information on them you
0:17:18successive detections
0:17:20and successive frames to the we prove the the result
0:17:23about that
0:17:24uh
0:17:25yeah
0:17:26if you want to
0:17:27can apply tracking
0:17:29sample
0:17:30and
0:17:31if you
0:17:31try and number and
0:17:33each
0:17:34the component
0:17:34the the for one
0:17:36you can
0:17:36uh
0:17:37improve the right
0:17:40i know if it's really did
0:17:42on no
0:17:43the that depends on the application
0:17:46i could for example to you of the arms
0:17:49and the movements of the like
0:17:50B
0:17:51one possible feature
0:17:52can
0:17:53looking
0:17:54okay and um
0:17:56just take a temporal window
0:17:59um
0:18:00just can the the what
0:18:02and you would have a
0:18:03uh
0:18:04three D you now we shape
0:18:06and then you can have that such method
0:18:09uh
0:18:10but all
0:18:11in the place of
0:18:13scraping excess
0:18:14we will describe folks
0:18:18that's all right
0:18:19thank you very much for all the sensors