i with and present them at that is and you know i to detect you mouth just so i of my presentation is just for that first i read explain why from our point of view the detection of humans a just streams is a binary now see that classification problem and then we will design a new robust probably stay it's so this approach classified see wet and i and i will present the that that's that's we used assess S or method the results and i really give a short conclusion uh uh so that's a by the definition of the problem of detection a few meant to be just straight so that taking you humans in video streams is useful for many applications such as a video surveillance but the at we just send an is oh to ten man are the goal is to detect a only you minds and the sink guess so the corner stone of such a an application is the ability to classify the observation in two class that are your man and you my a first approach to detect humans has been proposed for still images however such an approach has a of drawbacks it's based on the appearance and usually the appearance a few months is a pretty table because uh colours and textures a uh uh very also a lot of detection windows those have to be considered a image because we have to look for in months of the lot of positions and that a lot of states so um the consequence is that it is difficult to obtain a low false alarm rate per image a better press and she's in using a background subtraction and right this ones uh i struck the C of the users and the a moving objects in the scene thus uh we take advantage of that on prior information present in the you video streams and and D um the um decision is based on geometric information also on the a few close have to be considered so it's possible to obtain a lower false alarm rate but match and then with the approach based on the image so now we will design a probabilistic because of this approach a classify C where when we in i design our method we took into account the ours because we want to to get a robust method so when a background subtraction is followed by a a connected components and i is see let's make present defect first then may present um noise controls or or you do not but also um the sequence of a several users or or or a moving object could be match and last but not least um carried object and shot those could also be detected in the foreground so we need to have a a robust uh a description technique oh what that it the option a description technique exist and are may need to treat a yeah but can be used to choose the um description technique most adapted to uh what we need so the first criterion uh is related to the use of uh uh the rip points or to entirely or of the seal but you are you can that this that a a a a a and every image is the noise a fix the controls so a region based methods is less sensitive to noise and is preferable no other criteria is read it it to um the use of a bottle or look at the uh attribute with is a global that's the attribute um are we need it to do a whole shape that for um and the what the attributes are a to it by the presence of defects in the ceiling look at it that's speech shapes into smaller components with the hope to limit the influence of the to a few components so uh a in our case a region based the catfish description made that is prefer or also with they should and mit that is the simplest region based that at the spectrum it that one can image a it's the set of all pixels include in the show but the question is how can you justify such a set unfortunately there it doesn't exist any motion on them at the same for this task so we will do this by or in our our at each pixel that plays the role of next in the first stage each expert this site if it i the if is already to peaks set is part of a human silhouette or or not and we also assume that it gives the probably for it's this decision to be a um um a matter of the experts can be implemented by machine on methods in the second stage yep in is given by the expert our males by by weighted it what image is and the weight given to an expert depends on the probability for this region to be correct indeed intuitively T we want that un a confidant expect has an important weight in the few not decision so here is a a a a small example at to to understand them at that so it's X is a fix a and they produce is the probability for a it's to be part of a you human seem that's the the information given by the expert so in this example are six six express are used to classify the way each of them takes a look around here and based on the subset summation it gives the probability for a it's to be part of a human silhouette and then all this information are mapped into the few not decision that is that see what this you so to implement the experts we have used a or full uh machine learning need and except trees oh it's a it so that doesn't require you to optimize and parameter and i so that of all intrinsic a uh over fitting it's a like a a of decision trees and in our data each tree use of votes for one class you man on a human so we denote know it that i is the proportion of trees voting for the class you man and chris and then my there's respectively that don't ten amount of you man and then the man errors in the long set and we propose the following estimator for the probability uh of the set to be should from a humans you with so um one the learning set balance the probability uh is approximately equal to the proportion of trees voting for the class and however when the that long that the base is not but i and there is a yes in the decision you can but the trees and this is yeah as to be can so that's what we do in our probably the estimate so once we have a probability uh for each peak that we can compute the decision and they can buy the than D X fair and the probability for a this shouldn't to be can right using base rule also um i say then yeah yeah uh we use of a weighting a weight it um voting rule to give the class to the this right and that's the question number four yeah W value is the weight given to D X so i nine we present the a that that's it's we use to assess or method the results and give i short computer both our make set and testing set a contents you man and then you meant see what then in wrist i to one hundred by one hundred pixels this means that are we make that is scanning valiant and also also that the that can be used with a low resolution images also note that a long and it is not but i but this is not the problem is our probably to estimate of can as the B yes to by so are at the result the images is shows the probability a plus compute it in each peak set um these are probability maps oh what extent response to a probability of one where whereas of the excel corresponds to a probability of zero as you can see you meant see what i'm right to and that's means that our method that works very well uh_huh hmmm one we have computed the probability matrix in X three we also have to assign a weight to each said in fact we try to three different weighting strategies one of them being too large are automatically the weighting function and oh these strategies that's to see our a and then we have a a correct classification rate or one ninety percent for most you man and then you meant to where however for this is something are starting point because uh we did not yet try to optimize the set of attributes used to describe a set and also to describe it and we have to define a neighborhood and we don't try to optimize the neighborhood shape and the neighbourhood size so i believe that but the results be also are it thing with our method so in country and we have proposed a new system for the detection of humans well used it for video streams our approach has been designed to rely on geometric information and to be a robust to not so in a first that we apply a background subtraction noise but like sequence of best sounds and moving objects in the scene then a probabilistic information is computed for each of set in the foreground and finale is information is used to decide was of the sit where is that a for you or not there is show that our approach is promising for the detection of humans months industry but finding the optimal neighborhood used for addition a for the description of a set is left for future work thank you thank you sebastian any question uh a what about a comparison we the whole days uh the best and detection but is so that my first play with a but is you have really uh are you john more of uh detection windows we be considered um and this is about twelve thousand images uh i mean those per image um there for the um a force a lower rate should be multiplied by oh what i on to obtain the false alarm rate per image so it gives a really a um high for time rate right image uh also so there are uh techniques to keep only um a amount of uh response in the images but at least you have a false but detection or image this is not uh acceptable a table for vacation such as video a or but you can apply your or uh uh uh to the whole detect or a descriptor is going be on the uh um on the movie mask on the moving object okay but in this case yeah O and the um oh okay is on computed using colours and um the utterance of few months in videos is and predictable of you can have but of different colours and textures and from our point of view that's preferable to use on the uh geometric information and the temporal information that we have in the video streams i does as to do this that's why we have chosen yes but really a funny question but uh uh so uh i mean based on the shape but uh you you said you want to distinguish between humans and the rest so as so when you put next to these market i mean have you have you to this view my because uh uh for me like something of that has like uh of the same shape will be detected as you right right okay so in the market will be human um um in fact you can uh a longer was uh one keys in the negative a set so that set of nine you man us let's and probably if you don't have a uh two nine two too much nice in i images this will work but uh in real applications they are nice and therefore with to can us small images one hundred probably one hundred that's why guessing uh you're are right on the you will be detected this you my but with a a synthetic images without noise this is possible to distinguish well then you have problems with the close and you you so on okay thank other question you got any assumptions on how the cameras shall be well compared to people um yes indeed when you but you're running set you should um a it with uh see do taken from the same point of view do a real application for example if and the a real application you time right is a above the person and you should a place in your right so see let's they can the on the same but um this in practice not the problems the you meant see words in the long set can be generated with and can be with a a a a a human about to and uh for changing the point of view only a few minutes to compute and and uh related to the first question a got a sense of all these were form compared to uh a train cascade of classifiers been approaches that look at you humans a like humans or you humans a set of parts of the band using a cascade classifier and that it takes a long time to plane but it's not source will we at this point we didn't compare because uh we we hope to have a better results with or mental and also our method as as so um um positive points for example you have in the formation computed in pixel which means that for example if i or the get in my hand it will be that it it as being in the four a i'd the background subtraction but the probability maps right i to you raise the guitar if i one for example or to do was recovery of we also a like this so i think or or middle well uh steve well last question from you again mentioned that the this can be used for video sequence have you thought about how we use the temporal information because a you and that is a frame by frame yes yes that them brought information is used uh in fig by the background subtraction okay i but my question was to think that it the uh could use the temporal information on them you successive detections and successive frames to the we prove the the result about that uh yeah if you want to can apply tracking sample and if you try and number and each the component the the for one you can uh improve the right i know if it's really did on no the that depends on the application i could for example to you of the arms and the movements of the like B one possible feature can looking okay and um just take a temporal window um just can the the what and you would have a uh three D you now we shape and then you can have that such method uh but all in the place of scraping excess we will describe folks that's all right thank you very much for all the sensors