0:00:16very much for the introduction
0:00:20we talk about anomaly detection
0:00:24which is a topic which is being around one time
0:00:28uh the reason why i'm interested in this topic is that the
0:00:34so we have a national
0:00:38the major object
0:00:40which it is addressing the issues both you based on the computer vision system
0:00:51the main application slight changes
0:00:54what do you do you have to start from scratch
0:00:58all you have to can you use some of the models and uh one of
0:01:04the issues that one
0:01:07in this context
0:01:08it's uh on the detection because
0:01:11the system has no
0:01:14that if it is fully automatic system
0:01:16because you know that the it cannot cope with
0:01:21the main in both uh that uh because no competence to in that the
0:01:28since the data so that's the context and because it's a reasonably project
0:01:38we in the groove in psychology of the community college london
0:01:46so the plan is the
0:01:50the background then we want to on the money detection
0:01:57we review all uh right out on anybody detection and that
0:02:03a little bit of it is
0:02:10and that will then be all position
0:02:18solely on the money detection
0:02:20section system channel
0:02:23and the
0:02:24we apply
0:02:27oh set
0:02:29the problem
0:02:30you know
0:02:34interpretation system
0:02:37so that's plan to
0:02:39so if you
0:02:43this on vision system we present system the difficult to a stage is and the
0:02:50first of all the to the remote modules
0:02:54solving lost six
0:02:56do about the if you are not just like to do not basically problem but
0:03:04image processing vision i want to see you developing a system that actually application and
0:03:12many other issue
0:03:14think about the channel
0:03:16you need to collect a lot of training data because the existing systems uh i
0:03:24let me
0:03:28we do not know what is
0:03:31indicating that
0:03:33and the optimized system so it's like that
0:03:36uh nobles that's the goal go through an image that is why convolving
0:03:44and just uh as an example are we talking about the tennis video analysis
0:03:56for some
0:04:02yeah so uh and that was just a very few men version of this is
0:04:08that the linear
0:04:11and uh so it's at a G is to you
0:04:16an application and then
0:04:19all services about i
0:04:37okay so um
0:04:41the conference here is the uh is concerned with advanced concepts and in a way
0:04:46when you develop uh and interpretation system then uh in the sense that system is
0:04:52advanced in its own right so i could be just talking about the video uh
0:04:56the tennis video notation system but then my focus will be more on the second
0:05:02body point
0:05:04as i already mentioned so suppose you want to add up the system to some
0:05:09other domain uh even quite close domain and go see that the applications i will
0:05:14be uh talking about a very simple indeed nevertheless uh raising like you interest in
0:05:22issues and challenges
0:05:24and that if you want to go that if you want to
0:05:28benefit from many years of after and then try to use what you have and
0:05:35to develop a new uh competence you capability than possible you have to
0:05:42identify that you have a problem that you cannot cope with some input and uh
0:05:47then you have to modify the system inappropriate way and there are of course the
0:05:52other communities at all this stuff community support computer vision that whether or not a
0:05:59transfer that i mean and uh so uh
0:06:06will not be addressing those issues but uh at the end that once you have
0:06:11adopted the system and some new application then
0:06:15when i say i'd update
0:06:17i really mean develop new capability then the system needs yeah and not the functionality
0:06:24it needs to know uh can make sure a situation it is operating and that
0:06:33should be able to classify the context and uh in which it operates so that
0:06:38it can automatically select the appropriate uh domain knowledge voice separation so
0:06:47this is the system that we developed so basically it's the can analyze tennis video
0:06:53the way uh
0:06:56that we describe what the system looks like by the in principle
0:07:02the objective is that uh from the video it input completely automatically you are able
0:07:09to interpret what's going on to the point of points awarded avoiding the uh generating
0:07:16school from the process now
0:07:19i'm not talking about the uh style whole uh yeah we develop a system which
0:07:26works that from two D standard the real cost video okay so that makes a
0:07:33problem it would be difficult but anyway so in principle when you break the video
0:07:38into shots you want to know what's happening in short
0:07:43well as or so seconds uh and that there is not only uh who actually
0:07:49means in the running and we should be awarded a point
0:07:59probably unless you are young and have very good a nice and uh you will
0:08:03not be able to see the detail about the this is just to illustrate the
0:08:07complexity of the system
0:08:09and that it has uh why the few levels of course in so initially the
0:08:18uh video is broken into shorts and then the short each shot this process the
0:08:24separately basically uh and that is the
0:08:29level processing deals with the foreground-background separation
0:08:34then the key components of the content are extracted which is the motion of the
0:08:40ball and the players and the then the system yeah that uh means uh important
0:08:50and which is uh one important event is when the board changes detection and way
0:08:56it changes direction
0:08:58and then eventually there is some high level interpretation process of these talents so this
0:09:05is a more digestible somebody of the system okay about that basically the ball tracking
0:09:13is the most important you need to know whether code is uh you need to
0:09:17the text is important events and there is a high level interpretation part which is
0:09:23basically hidden markov model based
0:09:27no most of the modules that the system has use context in some way okay
0:09:36so when i talk about context here it's not the context it's not the domain
0:09:40where the system operate but it's the local context which is like the temporal or
0:09:44spatial so when you want to interpret for instance uh what's going on need to
0:09:50know not only whether board is but also whether players are so uh that is
0:09:54the interaction between objects in the video uh so in principle you are interested in
0:10:01integrity in every object in each frame about the neighboring objects have a uh
0:10:12one may also information which is which is very important and you want to use
0:10:15this to uh information jointly uh they provide contextual information and you want to use
0:10:22this information jointly to make interpretation so in principle you have some slow but knowledge
0:10:27domain knowledge which is a quite in some way i the through line in or
0:10:31partly through
0:10:33yeah so you didn't in the prior knowledge in uh and you are then comparing
0:10:39observations ritual model to make interpretation so this is very genetic uh indication that most
0:10:46of the modules are dealing with contextual information many more usability contextual information uh over
0:10:53time okay so that uh about the other modules deal with the spatial contextual information
0:11:00and some of them with both
0:11:02so the first one for instance is a module which is uh separating foreground and
0:11:10from background so you may want to what happened here uh
0:11:17because players disappear but basically it's the module which is below the remote site so
0:11:22you take video frames from a shot and the and relate them to each other
0:11:30and uh basically that allows you to go to was i and anything that's movie
0:11:35that frame is wiped out because it uh not the assistant information and so you
0:11:42have basically a background and then you can use the background to separate the foreground
0:11:48probably so
0:11:51that's one example all the that all this type of functionalities that the modules perform
0:11:58the most important one once you have uh
0:12:02uh the players and the can extractable used to detect the events so you can
0:12:08see that the so it's the ball tracking problem uh process and that each is
0:12:14also detecting when the ball is changing detection and uh you know uh where the
0:12:21code is that has been automatically the big picture it's a fully automatic system we
0:12:25can uh and that you also can detect players and from that you can derive
0:12:32this is uh
0:12:35so these are the events that we have extracted in time and the
0:12:41the sequence of these events and the position but they happen any action or more
0:12:46advanced a bit plane is a determine what's going on and you have a hidden
0:12:51markov model but it's a lot of the temporal structure in a small gains in
0:12:57general and so the mean pennies uh which allows you to interpret what's going on
0:13:03and you can then decide to who should be awarded to point at the end
0:13:10okay so and this is an example of what the system would produce so he
0:13:17on the left hand side you to actually tell you what's going on was awarded
0:13:22the point at one time at a tool training
0:13:26okay so we as i said you spent three years developing the system and we
0:13:32were just working with one video and it happened to be a video singles
0:13:36and then a somebody else question about what would happen if you actually applied it
0:13:41to doubles and you know so it's very simple the small transition but the nevertheless
0:13:48at uh
0:13:51significant enough transition for the system to fail so uh and uh
0:13:57so that's one thing about the
0:14:02it's not only question all system fail in you also would like to know uh
0:14:08when it fails to white fellows and can use land or something from it
0:14:12anyway so the question is what are the mechanisms that are needed for the system
0:14:18i didn't to realise that it's actually no longer competent to perform a certain functionality
0:14:25and the how can this functionality be extended
0:14:31already mentioned so this is the project the that we have features been sort of
0:14:36a motivating the work in this area and the anyway so already i think alluded
0:14:43to these mechanisms that we need to i don't to take this we need to
0:14:48cross knowledge and the we need the to adapt interpretation processes and acquire new competencies
0:14:56that way
0:14:57okay so
0:15:02these are the mechanism this is done is and to what i'm going to focus
0:15:06on anomaly detection so already talked with twenty minutes and i haven't the restarting the
0:15:11topic of the of the lecture okay so uh these are the mechanism that would
0:15:16be normally needed and that but one of the nice anomaly detection
0:15:22oh if you look at
0:15:26it well as the definition of on the money to start with and it's a
0:15:31normally understood this um so something deviating from automatically but the that the how the
0:15:38normal it is defined yeah is very general and that can be some sort order
0:15:44it can be sort of a statistical normally you can be a rule whatever so
0:15:48it's uh original there are also many synonyms and the interestingly some of these uh
0:15:55pseudonames the general mean
0:15:58deviation from normality about the sometimes the uh they have some uh additional nuance uh
0:16:04and that they may need for in cincinnati
0:16:08yeah regularity okay innovation so there is a
0:16:12difference between uh and the money and innovation because innovation usually means implies a change
0:16:21is of constant change you moving to some of the uh model of a proxy
0:16:30now what is that conventional model i think everybody knows that the menu look what
0:16:35anomalies you are normally thinking in terms of uh outliers of some distribution uh so
0:16:45you have a gaussian for instance and that was the
0:16:52uh making observations away yeah then used several it must be applied must be anomalous
0:16:58observations because it's not pretty consistent with my model of the data the experience the
0:17:04time uh that i make the past so one is a
0:17:11look in uh and basically the mathematical model is a statistical one in principle and
0:17:17the uh
0:17:23sometimes you the not only work with a single observation but the weight the multiple
0:17:29observations and then you may be interested whether uh we distribution of the all observations
0:17:35are different from the distributions of but uh of your model and uh so you
0:17:41could also be talking about the sum so that uh normally in terms of the
0:17:45shape of the distribution
0:17:49as i said to anomaly detection has been of interest for a long time uh
0:17:54domain and value goes back to the nineteenth century a people have been interested in
0:18:00developing normal model so gaussian models and uh for model in various uh sets of
0:18:08data observations and the and how they have been detected by the model is uh
0:18:16when the observation is consistent with that model so over the uh hundred years i
0:18:22suppose most of the work has been focusing on this type of concept of but
0:18:27uh no money and there are excellent surveys which uh make like quite easy and
0:18:33uh recently quite a lot of working in on the money detection comes from the
0:18:37security and the surveillance the communities as they are very much interested in formulating the
0:18:43problem of but uh detecting the something unusual as the and on the water detection
0:18:49problem but that although they may be using quite complex system most of the uh
0:18:55notions of on the money in these the papers are very close to the statistical
0:19:00notion so even if you have a complex just images multiple layers of interpretation very
0:19:06often people still uh loop on the money from these the uh from these models
0:19:14so you can estimate are presented in a very simple way is here so this
0:19:17is your basic system which is performing sometimes you have sense uh you got some
0:19:22usually single hypothesis model
0:19:26uh so i could distribution and the there and uh this derive some action something
0:19:34that something and you are interested to know whether the uh that is any and
0:19:40all money so you need some sort of a anomaly detector and usually would be
0:19:44some sort out lie detector and if it is an outlier then hopefully it will
0:19:47affect the action so you will not but for what you would normally performed
0:19:56no in a complex systems like uh a video system tennis video system you need
0:20:02to model like this big every model okay
0:20:07many of these modules are dealing with the multiclass problems so you don't have just
0:20:12a single
0:20:14hypothesis you have multiple hypothesis which is also introduced in the interest in complexity the
0:20:21into the equation you have a
0:20:26many levels of course in and some of these models are delay in a weighted
0:20:33high level information they have that down uh using contextual information and uh so although
0:20:40they may be interpreted the same sort of a have and they will be using
0:20:45different sources of information and so all these uh complexities are somehow not cultivate indicated
0:20:53weighted by these dimensional anomaly detection uh model so already mentioned so this the list
0:21:02of things so we have multiple models not just a single white with two hypotheses
0:21:09importantly in a much in perception
0:21:13very often we use discriminative approaches rather than generically if using discriminative approach you cannot
0:21:20really talk about outliers because you just know whether things on the right side of
0:21:24the boundary on all but the you have completely lose the uh every idea of
0:21:32that the observation which the which are trying to classify as an outlier on all
0:21:38is lost the uh to the system so um and if you wanted to detect
0:21:44a normally
0:21:46you would need to use both discriminative models get better performance but also maintain a
0:21:52generative model to know what's going on whether you are actually competent to make that
0:21:59uh you have very often areas in the observation space where you have a genuine
0:22:07ambiguity now give a genuine on but then the decisions you make you make in
0:22:14uh you have to be very careful about the menu can not necessarily interpret them
0:22:18as kind of money because you are you have a ambiguous situation you cannot have
0:22:23confidence that it's going to be an anomalous observation
0:22:28contextual reasoning already mentioned that the uh
0:22:32existing systems are not ready yet to deal with that and hierarchical representation
0:22:39about the two more things uh data quality you need to know whether the observation
0:22:47data you wanted and weighted is of the same quality as the data with the
0:22:53page the system has been designed you know that you make certain assumptions about the
0:22:58quality of the data any that quality changes then
0:23:02you the system has to decide if you differentiate between that situation and uh because
0:23:10it would be starting making errors okay and the anomalous situation where you if you
0:23:17have good quality data can be pretty confident that if something is the image then
0:23:22that it's going to be anonymous so the observation
0:23:27and uh
0:23:29more the boolean because it's a very often one
0:23:34introduced is uh
0:23:36a potential one another situation
0:23:40by uh
0:23:44you'll interpretation process because you want make that process to be as fast as possible
0:23:48so for instance if i am interested in object recognition and i know there is
0:23:53uh i don't know half a million objects
0:23:57right at hundred thousand objects you look at the various names and dictionary whatever it
0:24:03would be completely foolish to have a system which can interpret and very single object
0:24:09from that hundred thousand one place so you would the room that leads to something
0:24:14manageable and hopefully we'll deal we just uh i don't have it and the hypothesis
0:24:19on the list and all than a hundred thousand and that if you do that
0:24:23then you may observe something which is an autonomous but by your decision because you
0:24:28have actually simply by the system goes uh processing strategy is and making the assumption
0:24:36that the object will come only from this subset you yeah and if it doesn't
0:24:41then you should be able to detect it and recognise it and to do something
0:24:46about so you can then inject more hypotheses into the system uh if the none
0:24:51of the existing hypotheses is uh to get
0:25:00i talked about the deficiencies of or not normal anomaly concepts and just to show
0:25:06you more examples of the different nature all but not on the model situation so
0:25:12very often
0:25:14one is ask uh to solve the problem of spotting the difference okay so you
0:25:19can consider it also as a on the money detection problem so in this particular
0:25:24situation we have a nice a nice little object and that i think everybody cans
0:25:31for the difference is a head of a cat hopefully or something uh in the
0:25:38second picture are there any other animals
0:25:44very good yeah
0:25:46uh so this object has slightly different like uh angle any other
0:25:54yeah and the little bit shifted very good so we are very good on the
0:25:58money detectors
0:26:02but the uh the first instance was not all that will be is that all
0:26:07these uh the other animal is represent about the you know very simple uh comparison
0:26:13uh and four that's a computer systems are extremely good uh able to detect uh
0:26:19the dependencies and the you can uh in well okay so that's uh that's one
0:26:27example you have we already talked about distribution drape you talked about mobile the innovations
0:26:35anyway what about the this case
0:26:41are there any other monies
0:26:52well actually there are no differences the only difference is for maybe actually what to
0:26:57observe an image of a very acute vision uh what you jobs uh is the
0:27:02difference in uh information about the second image has been compressed data okay so you
0:27:10lose a little bit of a high frequency information but uh so obviously the compression
0:27:16introduces an obvious and if i have a on the money system which is to
0:27:21detect independence is that based on the sums of assume distribution and uh suddenly the
0:27:28noise characteristic change then uh you know is that difference not so this should not
0:27:34be detected as a normal is so big that quality is an extremely important concept
0:27:40in the in the process
0:27:43already talked about the
0:27:47uh contextual information and the or and hierarchical representation speech also exploit contextual information and
0:27:55uh so you know here
0:27:58every object in this image which is famous painting uh
0:28:04make sense is able to find about the relationship of these objects is the obviously
0:28:11unusual because you would not expect the locomotive to be jumping out of the fireplace
0:28:17and the uh so
0:28:20uh it's another example of the type of anomaly that you would like to be
0:28:25able to detect and
0:28:27explored and the system should be exploited so this is the conventional system that uh
0:28:35people have been using them almost four hundred years and um
0:28:40and this is probably what we need okay so
0:28:46the difference between that well this is the actual functioning system which is uh implement
0:28:50in some applications uh this just uh is the same thing is the blue box
0:28:55which has sensor and the actions alignment
0:29:01when ten okay
0:29:04the difference between this and that is that we have a probably multiple hypotheses of
0:29:10hypotheses the for each uh module okay and the or so we have probably several
0:29:18layers of interpretation not just a single layer we sure uh
0:29:24yeah so the high less would be using context and uh so that is the
0:29:28relationship between those players uh so you then need if you want to the text
0:29:34on the money in a sensible way you then need the following you need something
0:29:39that deals with the differences between contextual or non contextual processing
0:29:44and that that's a soap incongruence detector okay so uh yeah which is so if
0:29:52you have an object if i go uh back to my
0:30:01good really uh if i go here
0:30:07if i and this is my scene graphs or something estimation and in principle i'm
0:30:13uh trying to interpret every object okay but we know that i am interpreting one
0:30:17object uh in the to get off then i'm used in the contextual information provided
0:30:23by other objects so in principle you can uh you are interpreting that object in
0:30:28two different ways possible just using the measurement information relating to that object
0:30:34and secondly you use the measurement information and possibly prior knowledge about the configuration of
0:30:42one or contextual information provided by the neighbours which are will have impact on the
0:30:48interpretation of the subject so we have soft contextual and non contextual
0:30:53in the presentation and you can be measured in then continuance between those two
0:31:03but we need to other things
0:31:05we need to assess battle or do not actual one and for the contextual one
0:31:12uh whether we have any but we are dealing with ambiguity so what how much
0:31:16confidence we actually have in the interpretation that we are making so that's a one
0:31:23of the things that the needs to be i did in addition to incongruent uh
0:31:27we need to a module which is a seen data for the because that module
0:31:32tells us whether we really should be
0:31:36looking for a normally sober that even if you'd the text something spurious uh whether
0:31:41we should consider it as a normally because if the data quality has changed then
0:31:47we should not be uh
0:31:50simply saying well it's anomalous situation because so uh yeah the
0:31:56incorrect decisions so what about the change that will be induced by uh data of
0:32:02different quality uh well we should be you know and the
0:32:09and in addition to all that we need to the east and that
0:32:14uh anomaly detection process is the outlier detection process is because even if my non
0:32:22contextual and contextual decision making process is a uh
0:32:28functioning well and uh to function well they would be probably based on the stigma
0:32:33not body models then i will need
0:32:37some way of method deciding whether the observations a on the models are not whether
0:32:43they are outliers so i still need to the conventional model okay of undermining so
0:32:48that can see that these two blocks are the cable uh non contextual and contextual
0:32:56but hopefully i will not be using them very often because if i did lana
0:33:01the system would just the be computationally complex so uh
0:33:07ideally what uh you would like to do is to
0:33:12bros processing in these modules looking for our model is only when you want to
0:33:19get to do so and this the to get in can be done quite efficiently
0:33:22why this incongruence detection process
0:33:28can see that one of the mechanisms and only one there are others uh in
0:33:35the system that we need for detecting a normally scene perception systems is uh incongruence
0:33:41detect that and interestingly uh the work which uh well one of the original work
0:33:49in this area uh was running speech area uh
0:33:55i don't know whether actually brno was involved in this or more uh was it
0:34:00was just one of yours
0:34:02okay yeah so you work with the hynek hermansky and um work on the problem
0:34:09all the out-of-vocabulary what detection which is exactly the sort of a big a typical
0:34:15example of the problem we are dealing with you may have a uh you have
0:34:18a at least player speed a system which is processing data uh detecting phonemes so
0:34:27we have non contextual interpretation and contextual which combines the phonemes in words and you
0:34:33may be interested in detecting and whether there is any anomaly and that would be
0:34:37an or more like if for instance the phoneme detector functions that very well gives
0:34:42you very strong confidence in the interpretation but uh the
0:34:48word-level interpretation of police is garbage and it reduces got it's simply because the word
0:34:53doesn't exist in the dictionary
0:34:56so this is the no example of the situation uh that uh we would like
0:35:01to detect and the there was a five year project direct project funded by the
0:35:08U which is uh as being extending this basic idea to the image domain
0:35:16and the and also continued with application in speech and uh so that was uh
0:35:21but also by will get which it was then uh extending this work uh and
0:35:28the most of the other work which are the definitely want role is uh
0:35:34this name it yet the publications was published in the subsequent about two thousand and
0:35:39i two thousand and well so
0:35:45this is a little bit on the background about as i say is not directly
0:35:50focus and finally on the incongruent so detection how do you uh the fact that
0:35:58there is a difference between sort of a generic and the specific classifiers generally be
0:36:04in uh non contextual one uh well depends on the application about the
0:36:12and the if uh what is the implication of uh detecting such incongruence so that's
0:36:18uh what dialogue has produced but maybe actually try to use this in a only
0:36:25work on the tennis video interpretation it was not you know what the very citizen
0:36:30fine mention be a very open dealing with situations where the decisions but ambiguous and
0:36:35then you would not a bit on from that come from that you want but
0:36:40and with a normal situation we dealt with situations and we'll see that in a
0:36:45minute that the uh we had several videos of pennies and the
0:36:53even several videos of any single they all had a different chord to the from
0:36:58different tournaments so uh they had the uh the recorded in different conditions and uh
0:37:05some of them but noisier than others and that it was pretty a that you
0:37:09need to know something about data quality if you want uh to make a sensible
0:37:15uh decisions about on the money we still need it the basically the original uh
0:37:23technology so to speak of a normally detection so how by detection proces and uh
0:37:28so i think these were or right and what do they monitoring also is needed
0:37:33to measure whether distributions of shifted
0:37:39no wit is uh
0:37:42architectural system that is the state it it's a quite interesting because you can then
0:37:49based on the various uh
0:37:54on the outcomes or on the analysis of the uh the various modules in that
0:37:59anomaly detection system you can then a classifier you anomalies or situations yeah and they
0:38:06recognise different states so we can definitely recognise the state when you have no anomaly
0:38:11but you can also uh identify situations when you are dealing with an unknown up
0:38:17with noisy measurements you can uh the text situation that you have unknown objects uh
0:38:23when you have an incongruent or congruent labeling so all the various a space of
0:38:29uh nobody can be detected and to you get much better idea of what's going
0:38:34so ideally actually what we want to do is to start with ten days and
0:38:40move on to badminton and uh do uh detector or identify with the modules that
0:38:49will not have competence to well on the input data and uh try to correct
0:38:57the module so i don't then all inject knowledge so that the we can actually
0:39:02use the system volume application
0:39:05but the
0:39:09the wise you started something very simple and as i said just switching from singles
0:39:15tennis doubles so very simple situation so if you consider that problem then
0:39:21what would you expect
0:39:24first of all
0:39:26in doubled there are twice as many players
0:39:29that's yeah but the cold that is being used for the game is a wider
0:39:36so you have also the time lines which can uh
0:39:41which are illegal basically in the case of singles about in the case of doubles
0:39:46of uh they are more and the but everything else stays the same the rooms
0:39:53are the same that was that was quite a nice the
0:39:59challenge because it was not too complicated about the at the same time why the
0:40:04interesting to see what's going on and uh okay now in principle you would say
0:40:09well it's obvious well can just count the players and the drop is done about
0:40:14the impact is anybody who works and you or working in on images or video
0:40:21you know that the tech T and count been objects it's not as simple as
0:40:25that uh well lee because
0:40:31the vision process is are not perfect but partly because the uh application domain allows
0:40:44well this is not the use of a black and white so we speak about
0:40:49the it's not either two or four in the game but the there are other
0:40:53moving objects so you have line charges for instance and normally this tells us they
0:40:57still and when you uh do the most i can then use of uh they
0:41:02stay in the image about that sometimes they move okay and if they move they
0:41:07suddenly become moving object and uh then unless you have some sophisticated mechanism of distinguishing
0:41:15between players and other moving objects then you are stuck with the different count then
0:41:21you have more balls okay so the se is played and it goes out and
0:41:28the more boy runs collectible and uh so you have somebody five
0:41:34object detected that so if you actually look at and the statistics of a video
0:41:41okay uh not just the then uh this is what you would to the observed
0:41:46for singles okay so most of the time you would the detect just to plan
0:41:51to agents movie nations about the we in the many occasions uh you detect a
0:41:57human on and uh sometimes up to five so we have a distribution and equally
0:42:03for doubles uh you have a distribution so you have two sets of this the
0:42:08you look on the money on the basis of distributions rather than single observations but
0:42:15anyway so we are basically trying to differentiate between uh
0:42:21two distributions one which is a modal distribution and one which is of the distribution
0:42:26and look for differences and that anyway so that's uh what we have a downer
0:42:31which is a source standard approach and here we have some uh
0:42:37not the results but the data that we use so we have can see we
0:42:41have five videos uh of different length so they are not necessary or complete much
0:42:47is about the white it doesn't that they all of a different situation so we
0:42:52have uh australian uh japan tournament and us women and men single doubles and these
0:43:03are the numbers of the place and um
0:43:10and here we have some results okay so what we show here body to you
0:43:18as we are comparing distributions if you are using an into information just from one
0:43:23short then this will give you the performance that you would get
0:43:28for various scenarios okay and the uh basically uh here we are talking about the
0:43:36detection of under forty so uh we train on singles and when i talk about
0:43:43a normally i'm or there's S you mean that any training that is done is
0:43:50or was down in the norm a normal situation there are many cases where people
0:43:54are actually trying to synthetic pretty uh genetic on the monies create animal is and
0:44:01the uh but i think it's fundamentally wrong approach because that if you uh design
0:44:08a system you cannot possibly collect data or a normal situation for the idea uh
0:44:14and well then they would just becomes of new classes and the so the really
0:44:20this they're the appropriate the way of thinking about it is that you cannot train
0:44:24the system only with the norm on the most data and so order training was
0:44:30done only on singles we measured the level of noise and you can see for
0:44:35instance that the was thirty and uh men single pay that much lower high noise
0:44:42then uh the other two and uh and that uh
0:44:49has a serious implication because if you look at the data
0:44:54you can see that the if you train or no uh so here we have
0:44:59information okay here we trained on the uh australian women singles and japan single okay
0:45:06so you can see that the
0:45:10if you train on the uh good quality data and then you try to uh
0:45:17that's the system with the data of different quality then you have problems in you
0:45:21can see that from this guitar because this is basically the unwanted detection output or
0:45:27the single was so we should not be detecting any animal is because the art
0:45:31doesn't dealing with the same domain the system was trained to but uh to recognise
0:45:36the right interpret the tennis singles and here we are actually having a problem because
0:45:43the course of the noise condition uh we are uh detecting force anomalies uh right
0:45:50is that when we actually use the trained on data which is a little bit
0:45:54more noisy than that
0:45:57not all the best uh singles throws any animal is about the uh then we
0:46:02have to do a little bit more integration to get actually the results uh the
0:46:08unwanted direction di can correctly so that also shows you that the uh
0:46:14one is to be very careful about data quality and you just implications on the
0:46:18on the money detection process
0:46:21uh the second to the task was to well the second on the money that
0:46:27can analyze is that the ball goes out in the time lines and
0:46:32okay and the U
0:46:37so the gain should terminate
0:46:40but it doesn't just got it on and uh
0:46:44again we have developed a so what do we have well we use
0:46:51uh had be very careful to make sure that the a normal role in us
0:46:58on the models out who uh situations where uh which may genuinely ambiguous and because
0:47:05of the data in on the system itself anything very close to the boundary line
0:47:10between the timeline and the single school was on the models but the further away
0:47:16you got from that the remote the from the boundary line you have more confidence
0:47:19so we have values into this a confidence measure
0:47:23we as a filter to make sure that we are not trying to make uh
0:47:28decisions about on the money uh on data which is by its very nature i
0:47:33don't the ambiguous
0:47:38coming back again to my point that we are always using only the information that
0:47:45you acquire obtain in the local a problem but uh normal source norm and the
0:47:50model souls and uh so basically
0:47:54and the interpretation and the interpretation process associated with it so we have not really
0:48:00designed the system simply to detect the specifically on the money sits do in normal
0:48:06processing and the uh detecting on the monies as a result of that and the
0:48:14this is a just um an illustration of uh of the interpretation process in the
0:48:21so when there is a perceived
0:48:24uh there are okay well as when the system should that i mean a and
0:48:28actually the game continues we are uh follow in all the possible interpretations uh all
0:48:34the possible a interpretation possible it may happen and the uh on the basis of
0:48:40a that we are able to make a decision whether uh there is a no
0:48:44money because the game continues uh without uh bases and the two
0:48:53the detection is based on measuring incongruence between
0:48:57uh contextual a non contextual uh playgirl's basically so we have our event detection which
0:49:04is uh give you know so non contextual labels and we have the context of
0:49:08course in which takes into account the sequences uh of events over time so as
0:49:17as this are normally the case you have basically as i already explained you have
0:49:21two interpretations one which is contextual non contextual and you have to measure whether they
0:49:27are incongruent
0:49:28and one possible way of measuring it is using solve a bayesian surprise measure which
0:49:34is the form of a divergence on a discrete distributions of labels about the problem
0:49:41with that the measure is that uh it's very sensitive if you have a uh
0:49:47a probability which moves from point ninety five one then a suddenly you move into
0:49:55infinity and it the course this the hubble and uh so we have actually adapted
0:49:59that mention and to use the something which was a practically a much more efficient
0:50:06so we chose the top label the most uh the best supporting label for each
0:50:13of the contextual or non contextual hypotheses and just measure the difference between those two
0:50:18and you can actually show that in the two class case that we consider in
0:50:23this particular uh application whether the ball was out not uh we uh it ended
0:50:30up with a very simple way of measuring an incongruence between the states and when
0:50:36we did that on the videos that we trained with also we trained on single
0:50:42us on a single as we had no anomalies detected so no problem as you
0:50:47would expect and then on doubles uh well with the current system whatever limitations it
0:50:54has to be certainly detected some anomalies
0:50:58many where undetected uh not many but is more number of false positives and then
0:51:07you associate the anomalies with the and you have a cold where they happen
0:51:13they identified that reminds so it was very nice and that was very easy then
0:51:19use that association and we have another paper elsewhere uh which uh
0:51:26and then takes the output of this uh of this module of this anomaly detection
0:51:32module and through this association is able to but what define the rule based basically
0:51:38say well the court remove the animal is the cold size has to change and
0:51:45it has to use that reminds us to uh to be able to in that
0:51:50discontent successfully so you know eight
0:51:54i think i
0:51:56talked about i'll give you examples of all the mechanisms that the rainy day for
0:52:02anomaly detection and but exercise by application uh principle you need this context detection which
0:52:08is about domain detection a rather than a uh real or complex uh for system
0:52:15to acquire new competence and once it has then it has to be able to
0:52:19pick out which uh domain it's to do it but in a bit and that
0:52:25the take the appropriate knowledge base and uh this is the basic system is used
0:52:30in the interpretation that way role of a high level and the this is the
0:52:34anomaly detection mechanism but that's the module that uh S is still need it and
0:52:40that would be added to the system to
0:52:44lexus successfully so that brings me to conclusion i hope that i have a display
0:52:50did you that uh i know what detection in machine perception requires more mechanism then
0:52:56what is normally what is just over the body conventional model and the and what
0:53:02these mechanisms are and how useful in practical applications thank you very much attention
0:53:50the use a system
0:53:52well i think the you know what goes into the anomaly detection system i think
0:53:57it's genetic about the application was specific okay so obviously are solutions will not work
0:54:04for your problem about the i think of one uh the notion of data quality
0:54:10is very important and the also the approach the problem that one needs one should
0:54:17be trying to train the system just with the normandy time but it's you all
0:54:25you also mulch within yourself in the foot because if you have examples of on
0:54:28the money then it would help you to improve the design of the nevertheless uh
0:54:33you know system then it will be able just to detect the what you presented
0:54:38to it you in training and uh and so there is a little bit of
0:54:42a dynamo yeah
0:55:12okay uh
0:55:14basically i think in all the protocol that all the videos that the use of
0:55:18uh from professional matches and the cameras with fixed but any okay this is why
0:55:24we needed to do the most like uh detection with section um
0:55:32in principle
0:55:36at least we always use the prior information that this the ground plane so you
0:55:41need to based on the information you can solve a calibrate the comment on expect
0:55:47to the scene of the speech and uh so uh you doesn't have to it
0:55:52can move in it is not the solution is not just for a single position
0:55:57of the common uh you can always uh contrary the system for any position and
0:56:02this is what actually happens when a remote uses them
0:56:32i think it was more to do with access uh i think the video speech
0:56:38we go around the through internet ordering to internet maybe unique go but we didn't
0:56:43looking into it uh but we knew that uh it would be difficult to get
0:56:47the copies of the same but on broadcast
0:56:50although we have a one of two with that B C so
0:56:56a game and uh yeah
0:57:10it that uh it's not regulate and i think that would say that we have
0:57:15maybe they are losing probably for the confidence measure we are probably losing half of
0:57:19the ten timeline
0:57:21uh the way we are not making decisions because uh
0:57:27the ambiguity and i'm because we can accuracy of the system and it actually gets
0:57:31less is for the part of the core okay because the further away from the
0:57:37comment often a degraded in accuracy
0:58:03well i'm i hope it will generate some other one is but uh that's an
0:58:07interesting proposition