Speech Transcript - Anomaly detection in machine perception systems

0:00:16	very much for the introduction
0:00:18	uh
0:00:20	we talk about anomaly detection
0:00:24	which is a topic which is being around one time
0:00:28	uh the reason why i'm interested in this topic is that the
0:00:34	so we have a national
0:00:37	project
0:00:38	the major object
0:00:40	which it is addressing the issues both you based on the computer vision system
0:00:47	other
0:00:51	the main application slight changes
0:00:54	what do you do you have to start from scratch
0:00:58	all you have to can you use some of the models and uh one of
0:01:04	the issues that one
0:01:07	in this context
0:01:08	it's uh on the detection because
0:01:11	the system has no
0:01:14	that if it is fully automatic system
0:01:16	because you know that the it cannot cope with
0:01:21	the main in both uh that uh because no competence to in that the
0:01:28	since the data so that's the context and because it's a reasonably project
0:01:38	we in the groove in psychology of the community college london
0:01:46	so the plan is the
0:01:48	stop
0:01:50	the background then we want to on the money detection
0:01:55	uh
0:01:57	we review all uh right out on anybody detection and that
0:02:03	a little bit of it is
0:02:04	all
0:02:08	approaches
0:02:10	and that will then be all position
0:02:16	yeah
0:02:18	solely on the money detection
0:02:20	section system channel
0:02:23	and the
0:02:24	we apply
0:02:27	oh set
0:02:29	the problem
0:02:30	you know
0:02:34	interpretation system
0:02:37	so that's plan to
0:02:39	so if you
0:02:43	this on vision system we present system the difficult to a stage is and the
0:02:50	first of all the to the remote modules
0:02:54	solving lost six
0:02:56	do about the if you are not just like to do not basically problem but
0:03:02	uh
0:03:04	image processing vision i want to see you developing a system that actually application and
0:03:12	many other issue
0:03:14	think about the channel
0:03:16	you need to collect a lot of training data because the existing systems uh i
0:03:24	let me
0:03:25	observations
0:03:28	we do not know what is
0:03:31	indicating that
0:03:33	and the optimized system so it's like that
0:03:36	uh nobles that's the goal go through an image that is why convolving
0:03:44	and just uh as an example are we talking about the tennis video analysis
0:03:53	you
0:03:56	for some
0:04:00	school
0:04:02	yeah so uh and that was just a very few men version of this is
0:04:08	that the linear
0:04:11	and uh so it's at a G is to you
0:04:16	an application and then
0:04:19	all services about i
0:04:37	okay so um
0:04:41	the conference here is the uh is concerned with advanced concepts and in a way
0:04:46	when you develop uh and interpretation system then uh in the sense that system is
0:04:52	advanced in its own right so i could be just talking about the video uh
0:04:56	the tennis video notation system but then my focus will be more on the second
0:05:02	body point
0:05:04	as i already mentioned so suppose you want to add up the system to some
0:05:09	other domain uh even quite close domain and go see that the applications i will
0:05:14	be uh talking about a very simple indeed nevertheless uh raising like you interest in
0:05:22	issues and challenges
0:05:24	and that if you want to go that if you want to
0:05:28	benefit from many years of after and then try to use what you have and
0:05:35	to develop a new uh competence you capability than possible you have to
0:05:42	identify that you have a problem that you cannot cope with some input and uh
0:05:47	then you have to modify the system inappropriate way and there are of course the
0:05:52	other communities at all this stuff community support computer vision that whether or not a
0:05:59	transfer that i mean and uh so uh
0:06:06	will not be addressing those issues but uh at the end that once you have
0:06:11	adopted the system and some new application then
0:06:15	when i say i'd update
0:06:17	i really mean develop new capability then the system needs yeah and not the functionality
0:06:24	it needs to know uh can make sure a situation it is operating and that
0:06:33	should be able to classify the context and uh in which it operates so that
0:06:38	it can automatically select the appropriate uh domain knowledge voice separation so
0:06:47	this is the system that we developed so basically it's the can analyze tennis video
0:06:53	the way uh
0:06:56	that we describe what the system looks like by the in principle
0:07:02	the objective is that uh from the video it input completely automatically you are able
0:07:09	to interpret what's going on to the point of points awarded avoiding the uh generating
0:07:16	school from the process now
0:07:19	i'm not talking about the uh style whole uh yeah we develop a system which
0:07:26	works that from two D standard the real cost video okay so that makes a
0:07:33	problem it would be difficult but anyway so in principle when you break the video
0:07:38	into shots you want to know what's happening in short
0:07:43	well as or so seconds uh and that there is not only uh who actually
0:07:49	means in the running and we should be awarded a point
0:07:57	no
0:07:59	probably unless you are young and have very good a nice and uh you will
0:08:03	not be able to see the detail about the this is just to illustrate the
0:08:07	complexity of the system
0:08:09	and that it has uh why the few levels of course in so initially the
0:08:18	uh video is broken into shorts and then the short each shot this process the
0:08:24	separately basically uh and that is the
0:08:29	level processing deals with the foreground-background separation
0:08:34	then the key components of the content are extracted which is the motion of the
0:08:40	ball and the players and the then the system yeah that uh means uh important
0:08:48	events
0:08:50	and which is uh one important event is when the board changes detection and way
0:08:56	it changes direction
0:08:58	and then eventually there is some high level interpretation process of these talents so this
0:09:05	is a more digestible somebody of the system okay about that basically the ball tracking
0:09:13	is the most important you need to know whether code is uh you need to
0:09:17	the text is important events and there is a high level interpretation part which is
0:09:23	basically hidden markov model based
0:09:27	no most of the modules that the system has use context in some way okay
0:09:36	so when i talk about context here it's not the context it's not the domain
0:09:40	where the system operate but it's the local context which is like the temporal or
0:09:44	spatial so when you want to interpret for instance uh what's going on need to
0:09:50	know not only whether board is but also whether players are so uh that is
0:09:54	the interaction between objects in the video uh so in principle you are interested in
0:10:01	integrity in every object in each frame about the neighboring objects have a uh
0:10:12	one may also information which is which is very important and you want to use
0:10:15	this to uh information jointly uh they provide contextual information and you want to use
0:10:22	this information jointly to make interpretation so in principle you have some slow but knowledge
0:10:27	domain knowledge which is a quite in some way i the through line in or
0:10:31	partly through
0:10:33	yeah so you didn't in the prior knowledge in uh and you are then comparing
0:10:39	observations ritual model to make interpretation so this is very genetic uh indication that most
0:10:46	of the modules are dealing with contextual information many more usability contextual information uh over
0:10:53	time okay so that uh about the other modules deal with the spatial contextual information
0:11:00	and some of them with both
0:11:02	so the first one for instance is a module which is uh separating foreground and
0:11:10	from background so you may want to what happened here uh
0:11:17	because players disappear but basically it's the module which is below the remote site so
0:11:22	you take video frames from a shot and the and relate them to each other
0:11:30	and uh basically that allows you to go to was i and anything that's movie
0:11:35	that frame is wiped out because it uh not the assistant information and so you
0:11:42	have basically a background and then you can use the background to separate the foreground
0:11:48	probably so
0:11:51	that's one example all the that all this type of functionalities that the modules perform
0:11:58	the most important one once you have uh
0:12:02	uh the players and the can extractable used to detect the events so you can
0:12:08	see that the so it's the ball tracking problem uh process and that each is
0:12:14	also detecting when the ball is changing detection and uh you know uh where the
0:12:21	code is that has been automatically the big picture it's a fully automatic system we
0:12:25	can uh and that you also can detect players and from that you can derive
0:12:30	interpretation
0:12:32	this is uh
0:12:35	so these are the events that we have extracted in time and the
0:12:41	the sequence of these events and the position but they happen any action or more
0:12:46	advanced a bit plane is a determine what's going on and you have a hidden
0:12:51	markov model but it's a lot of the temporal structure in a small gains in
0:12:57	general and so the mean pennies uh which allows you to interpret what's going on
0:13:03	and you can then decide to who should be awarded to point at the end
0:13:10	okay so and this is an example of what the system would produce so he
0:13:17	on the left hand side you to actually tell you what's going on was awarded
0:13:22	the point at one time at a tool training
0:13:26	okay so we as i said you spent three years developing the system and we
0:13:32	were just working with one video and it happened to be a video singles
0:13:36	and then a somebody else question about what would happen if you actually applied it
0:13:41	to doubles and you know so it's very simple the small transition but the nevertheless
0:13:48	at uh
0:13:51	significant enough transition for the system to fail so uh and uh
0:13:57	so that's one thing about the
0:14:02	it's not only question all system fail in you also would like to know uh
0:14:08	when it fails to white fellows and can use land or something from it
0:14:12	anyway so the question is what are the mechanisms that are needed for the system
0:14:18	i didn't to realise that it's actually no longer competent to perform a certain functionality
0:14:25	and the how can this functionality be extended
0:14:31	already mentioned so this is the project the that we have features been sort of
0:14:36	a motivating the work in this area and the anyway so already i think alluded
0:14:43	to these mechanisms that we need to i don't to take this we need to
0:14:48	cross knowledge and the we need the to adapt interpretation processes and acquire new competencies
0:14:56	that way
0:14:57	okay so
0:15:02	these are the mechanism this is done is and to what i'm going to focus
0:15:06	on anomaly detection so already talked with twenty minutes and i haven't the restarting the
0:15:11	topic of the of the lecture okay so uh these are the mechanism that would
0:15:16	be normally needed and that but one of the nice anomaly detection
0:15:22	oh if you look at
0:15:24	the
0:15:26	it well as the definition of on the money to start with and it's a
0:15:31	normally understood this um so something deviating from automatically but the that the how the
0:15:38	normal it is defined yeah is very general and that can be some sort order
0:15:44	it can be sort of a statistical normally you can be a rule whatever so
0:15:48	it's uh original there are also many synonyms and the interestingly some of these uh
0:15:55	pseudonames the general mean
0:15:58	deviation from normality about the sometimes the uh they have some uh additional nuance uh
0:16:04	and that they may need for in cincinnati
0:16:08	yeah regularity okay innovation so there is a
0:16:12	difference between uh and the money and innovation because innovation usually means implies a change
0:16:21	is of constant change you moving to some of the uh model of a proxy
0:16:27	experience
0:16:30	now what is that conventional model i think everybody knows that the menu look what
0:16:35	anomalies you are normally thinking in terms of uh outliers of some distribution uh so
0:16:45	you have a gaussian for instance and that was the
0:16:52	uh making observations away yeah then used several it must be applied must be anomalous
0:16:58	observations because it's not pretty consistent with my model of the data the experience the
0:17:04	time uh that i make the past so one is a
0:17:11	look in uh and basically the mathematical model is a statistical one in principle and
0:17:17	the uh
0:17:23	sometimes you the not only work with a single observation but the weight the multiple
0:17:29	observations and then you may be interested whether uh we distribution of the all observations
0:17:35	are different from the distributions of but uh of your model and uh so you
0:17:41	could also be talking about the sum so that uh normally in terms of the
0:17:45	shape of the distribution
0:17:49	as i said to anomaly detection has been of interest for a long time uh
0:17:54	domain and value goes back to the nineteenth century a people have been interested in
0:18:00	developing normal model so gaussian models and uh for model in various uh sets of
0:18:08	data observations and the and how they have been detected by the model is uh
0:18:16	when the observation is consistent with that model so over the uh hundred years i
0:18:22	suppose most of the work has been focusing on this type of concept of but
0:18:27	uh no money and there are excellent surveys which uh make like quite easy and
0:18:33	uh recently quite a lot of working in on the money detection comes from the
0:18:37	security and the surveillance the communities as they are very much interested in formulating the
0:18:43	problem of but uh detecting the something unusual as the and on the water detection
0:18:49	problem but that although they may be using quite complex system most of the uh
0:18:55	notions of on the money in these the papers are very close to the statistical
0:19:00	notion so even if you have a complex just images multiple layers of interpretation very
0:19:06	often people still uh loop on the money from these the uh from these models
0:19:14	so you can estimate are presented in a very simple way is here so this
0:19:17	is your basic system which is performing sometimes you have sense uh you got some
0:19:22	usually single hypothesis model
0:19:26	uh so i could distribution and the there and uh this derive some action something
0:19:34	that something and you are interested to know whether the uh that is any and
0:19:40	all money so you need some sort of a anomaly detector and usually would be
0:19:44	some sort out lie detector and if it is an outlier then hopefully it will
0:19:47	affect the action so you will not but for what you would normally performed
0:19:56	no in a complex systems like uh a video system tennis video system you need
0:20:02	to model like this big every model okay
0:20:07	many of these modules are dealing with the multiclass problems so you don't have just
0:20:12	a single
0:20:14	hypothesis you have multiple hypothesis which is also introduced in the interest in complexity the
0:20:21	into the equation you have a
0:20:26	many levels of course in and some of these models are delay in a weighted
0:20:33	high level information they have that down uh using contextual information and uh so although
0:20:40	they may be interpreted the same sort of a have and they will be using
0:20:45	different sources of information and so all these uh complexities are somehow not cultivate indicated
0:20:53	weighted by these dimensional anomaly detection uh model so already mentioned so this the list
0:21:02	of things so we have multiple models not just a single white with two hypotheses
0:21:07	model
0:21:09	importantly in a much in perception
0:21:13	very often we use discriminative approaches rather than generically if using discriminative approach you cannot
0:21:20	really talk about outliers because you just know whether things on the right side of
0:21:24	the boundary on all but the you have completely lose the uh every idea of
0:21:32	that the observation which the which are trying to classify as an outlier on all
0:21:38	is lost the uh to the system so um and if you wanted to detect
0:21:44	a normally
0:21:46	you would need to use both discriminative models get better performance but also maintain a
0:21:52	generative model to know what's going on whether you are actually competent to make that
0:21:57	decision
0:21:59	uh you have very often areas in the observation space where you have a genuine
0:22:07	ambiguity now give a genuine on but then the decisions you make you make in
0:22:14	uh you have to be very careful about the menu can not necessarily interpret them
0:22:18	as kind of money because you are you have a ambiguous situation you cannot have
0:22:23	confidence that it's going to be an anomalous observation
0:22:28	contextual reasoning already mentioned that the uh
0:22:32	existing systems are not ready yet to deal with that and hierarchical representation
0:22:39	about the two more things uh data quality you need to know whether the observation
0:22:47	data you wanted and weighted is of the same quality as the data with the
0:22:53	page the system has been designed you know that you make certain assumptions about the
0:22:58	quality of the data any that quality changes then
0:23:02	you the system has to decide if you differentiate between that situation and uh because
0:23:10	it would be starting making errors okay and the anomalous situation where you if you
0:23:17	have good quality data can be pretty confident that if something is the image then
0:23:22	that it's going to be anonymous so the observation
0:23:27	and uh
0:23:29	more the boolean because it's a very often one
0:23:34	introduced is uh
0:23:36	a potential one another situation
0:23:40	by uh
0:23:44	you'll interpretation process because you want make that process to be as fast as possible
0:23:48	so for instance if i am interested in object recognition and i know there is
0:23:53	uh i don't know half a million objects
0:23:57	right at hundred thousand objects you look at the various names and dictionary whatever it
0:24:03	would be completely foolish to have a system which can interpret and very single object
0:24:09	from that hundred thousand one place so you would the room that leads to something
0:24:14	manageable and hopefully we'll deal we just uh i don't have it and the hypothesis
0:24:19	on the list and all than a hundred thousand and that if you do that
0:24:23	then you may observe something which is an autonomous but by your decision because you
0:24:28	have actually simply by the system goes uh processing strategy is and making the assumption
0:24:36	that the object will come only from this subset you yeah and if it doesn't
0:24:41	then you should be able to detect it and recognise it and to do something
0:24:46	about so you can then inject more hypotheses into the system uh if the none
0:24:51	of the existing hypotheses is uh to get
0:24:56	so
0:25:00	i talked about the deficiencies of or not normal anomaly concepts and just to show
0:25:06	you more examples of the different nature all but not on the model situation so
0:25:12	very often
0:25:14	one is ask uh to solve the problem of spotting the difference okay so you
0:25:19	can consider it also as a on the money detection problem so in this particular
0:25:24	situation we have a nice a nice little object and that i think everybody cans
0:25:31	for the difference is a head of a cat hopefully or something uh in the
0:25:38	second picture are there any other animals
0:25:44	very good yeah
0:25:46	uh so this object has slightly different like uh angle any other
0:25:54	yeah and the little bit shifted very good so we are very good on the
0:25:58	money detectors
0:26:02	but the uh the first instance was not all that will be is that all
0:26:07	these uh the other animal is represent about the you know very simple uh comparison
0:26:13	uh and four that's a computer systems are extremely good uh able to detect uh
0:26:19	the dependencies and the you can uh in well okay so that's uh that's one
0:26:27	example you have we already talked about distribution drape you talked about mobile the innovations
0:26:35	anyway what about the this case
0:26:41	are there any other monies
0:26:52	well actually there are no differences the only difference is for maybe actually what to
0:26:57	observe an image of a very acute vision uh what you jobs uh is the
0:27:02	difference in uh information about the second image has been compressed data okay so you
0:27:10	lose a little bit of a high frequency information but uh so obviously the compression
0:27:16	introduces an obvious and if i have a on the money system which is to
0:27:21	detect independence is that based on the sums of assume distribution and uh suddenly the
0:27:28	noise characteristic change then uh you know is that difference not so this should not
0:27:34	be detected as a normal is so big that quality is an extremely important concept
0:27:40	in the in the process
0:27:43	already talked about the
0:27:47	uh contextual information and the or and hierarchical representation speech also exploit contextual information and
0:27:55	uh so you know here
0:27:58	every object in this image which is famous painting uh
0:28:04	make sense is able to find about the relationship of these objects is the obviously
0:28:11	unusual because you would not expect the locomotive to be jumping out of the fireplace
0:28:17	and the uh so
0:28:20	uh it's another example of the type of anomaly that you would like to be
0:28:25	able to detect and
0:28:27	explored and the system should be exploited so this is the conventional system that uh
0:28:35	people have been using them almost four hundred years and um
0:28:40	and this is probably what we need okay so
0:28:46	the difference between that well this is the actual functioning system which is uh implement
0:28:50	in some applications uh this just uh is the same thing is the blue box
0:28:55	which has sensor and the actions alignment
0:29:01	when ten okay
0:29:04	the difference between this and that is that we have a probably multiple hypotheses of
0:29:10	hypotheses the for each uh module okay and the or so we have probably several
0:29:18	layers of interpretation not just a single layer we sure uh
0:29:24	yeah so the high less would be using context and uh so that is the
0:29:28	relationship between those players uh so you then need if you want to the text
0:29:34	on the money in a sensible way you then need the following you need something
0:29:39	that deals with the differences between contextual or non contextual processing
0:29:44	and that that's a soap incongruence detector okay so uh yeah which is so if
0:29:52	you have an object if i go uh back to my
0:30:01	good really uh if i go here
0:30:07	if i and this is my scene graphs or something estimation and in principle i'm
0:30:13	uh trying to interpret every object okay but we know that i am interpreting one
0:30:17	object uh in the to get off then i'm used in the contextual information provided
0:30:23	by other objects so in principle you can uh you are interpreting that object in
0:30:28	two different ways possible just using the measurement information relating to that object
0:30:34	and secondly you use the measurement information and possibly prior knowledge about the configuration of
0:30:42	one or contextual information provided by the neighbours which are will have impact on the
0:30:48	interpretation of the subject so we have soft contextual and non contextual
0:30:53	in the presentation and you can be measured in then continuance between those two
0:30:59	uh
0:31:03	but we need to other things
0:31:05	we need to assess battle or do not actual one and for the contextual one
0:31:12	uh whether we have any but we are dealing with ambiguity so what how much
0:31:16	confidence we actually have in the interpretation that we are making so that's a one
0:31:23	of the things that the needs to be i did in addition to incongruent uh
0:31:27	we need to a module which is a seen data for the because that module
0:31:32	tells us whether we really should be
0:31:36	looking for a normally sober that even if you'd the text something spurious uh whether
0:31:41	we should consider it as a normally because if the data quality has changed then
0:31:47	we should not be uh
0:31:50	simply saying well it's anomalous situation because so uh yeah the
0:31:56	incorrect decisions so what about the change that will be induced by uh data of
0:32:02	different quality uh well we should be you know and the
0:32:09	and in addition to all that we need to the east and that
0:32:14	uh anomaly detection process is the outlier detection process is because even if my non
0:32:22	contextual and contextual decision making process is a uh
0:32:28	functioning well and uh to function well they would be probably based on the stigma
0:32:33	not body models then i will need
0:32:37	some way of method deciding whether the observations a on the models are not whether
0:32:43	they are outliers so i still need to the conventional model okay of undermining so
0:32:48	that can see that these two blocks are the cable uh non contextual and contextual
0:32:54	process
0:32:56	but hopefully i will not be using them very often because if i did lana
0:33:01	the system would just the be computationally complex so uh
0:33:07	ideally what uh you would like to do is to
0:33:12	bros processing in these modules looking for our model is only when you want to
0:33:19	get to do so and this the to get in can be done quite efficiently
0:33:22	why this incongruence detection process
0:33:27	now
0:33:28	can see that one of the mechanisms and only one there are others uh in
0:33:34	uh
0:33:35	the system that we need for detecting a normally scene perception systems is uh incongruence
0:33:41	detect that and interestingly uh the work which uh well one of the original work
0:33:49	in this area uh was running speech area uh
0:33:55	i don't know whether actually brno was involved in this or more uh was it
0:34:00	was just one of yours
0:34:02	okay yeah so you work with the hynek hermansky and um work on the problem
0:34:09	all the out-of-vocabulary what detection which is exactly the sort of a big a typical
0:34:15	example of the problem we are dealing with you may have a uh you have
0:34:18	a at least player speed a system which is processing data uh detecting phonemes so
0:34:27	we have non contextual interpretation and contextual which combines the phonemes in words and you
0:34:33	may be interested in detecting and whether there is any anomaly and that would be
0:34:37	an or more like if for instance the phoneme detector functions that very well gives
0:34:42	you very strong confidence in the interpretation but uh the
0:34:48	word-level interpretation of police is garbage and it reduces got it's simply because the word
0:34:53	doesn't exist in the dictionary
0:34:56	so this is the no example of the situation uh that uh we would like
0:35:01	to detect and the there was a five year project direct project funded by the
0:35:08	U which is uh as being extending this basic idea to the image domain
0:35:16	and the and also continued with application in speech and uh so that was uh
0:35:21	but also by will get which it was then uh extending this work uh and
0:35:28	the most of the other work which are the definitely want role is uh
0:35:34	this name it yet the publications was published in the subsequent about two thousand and
0:35:39	i two thousand and well so
0:35:45	this is a little bit on the background about as i say is not directly
0:35:50	focus and finally on the incongruent so detection how do you uh the fact that
0:35:58	there is a difference between sort of a generic and the specific classifiers generally be
0:36:04	in uh non contextual one uh well depends on the application about the
0:36:12	and the if uh what is the implication of uh detecting such incongruence so that's
0:36:18	uh what dialogue has produced but maybe actually try to use this in a only
0:36:25	work on the tennis video interpretation it was not you know what the very citizen
0:36:30	fine mention be a very open dealing with situations where the decisions but ambiguous and
0:36:35	then you would not a bit on from that come from that you want but
0:36:40	and with a normal situation we dealt with situations and we'll see that in a
0:36:45	minute that the uh we had several videos of pennies and the
0:36:53	even several videos of any single they all had a different chord to the from
0:36:58	different tournaments so uh they had the uh the recorded in different conditions and uh
0:37:05	some of them but noisier than others and that it was pretty a that you
0:37:09	need to know something about data quality if you want uh to make a sensible
0:37:15	uh decisions about on the money we still need it the basically the original uh
0:37:23	technology so to speak of a normally detection so how by detection proces and uh
0:37:28	so i think these were or right and what do they monitoring also is needed
0:37:33	to measure whether distributions of shifted
0:37:39	no wit is uh
0:37:42	architectural system that is the state it it's a quite interesting because you can then
0:37:49	based on the various uh
0:37:52	uh
0:37:54	on the outcomes or on the analysis of the uh the various modules in that
0:37:59	anomaly detection system you can then a classifier you anomalies or situations yeah and they
0:38:06	recognise different states so we can definitely recognise the state when you have no anomaly
0:38:11	but you can also uh identify situations when you are dealing with an unknown up
0:38:17	with noisy measurements you can uh the text situation that you have unknown objects uh
0:38:23	when you have an incongruent or congruent labeling so all the various a space of
0:38:29	uh nobody can be detected and to you get much better idea of what's going
0:38:33	on
0:38:34	so ideally actually what we want to do is to start with ten days and
0:38:40	move on to badminton and uh do uh detector or identify with the modules that
0:38:49	will not have competence to well on the input data and uh try to correct
0:38:57	the module so i don't then all inject knowledge so that the we can actually
0:39:02	use the system volume application
0:39:05	but the
0:39:09	the wise you started something very simple and as i said just switching from singles
0:39:15	tennis doubles so very simple situation so if you consider that problem then
0:39:21	what would you expect
0:39:24	first of all
0:39:26	in doubled there are twice as many players
0:39:29	that's yeah but the cold that is being used for the game is a wider
0:39:36	so you have also the time lines which can uh
0:39:41	which are illegal basically in the case of singles about in the case of doubles
0:39:46	of uh they are more and the but everything else stays the same the rooms
0:39:53	are the same that was that was quite a nice the
0:39:57	uh
0:39:59	challenge because it was not too complicated about the at the same time why the
0:40:04	interesting to see what's going on and uh okay now in principle you would say
0:40:09	well it's obvious well can just count the players and the drop is done about
0:40:14	the impact is anybody who works and you or working in on images or video
0:40:21	you know that the tech T and count been objects it's not as simple as
0:40:25	that uh well lee because
0:40:31	the vision process is are not perfect but partly because the uh application domain allows
0:40:41	basically
0:40:43	uh
0:40:44	well this is not the use of a black and white so we speak about
0:40:49	the it's not either two or four in the game but the there are other
0:40:53	moving objects so you have line charges for instance and normally this tells us they
0:40:57	still and when you uh do the most i can then use of uh they
0:41:02	stay in the image about that sometimes they move okay and if they move they
0:41:07	suddenly become moving object and uh then unless you have some sophisticated mechanism of distinguishing
0:41:15	between players and other moving objects then you are stuck with the different count then
0:41:21	you have more balls okay so the se is played and it goes out and
0:41:28	the more boy runs collectible and uh so you have somebody five
0:41:34	object detected that so if you actually look at and the statistics of a video
0:41:41	okay uh not just the then uh this is what you would to the observed
0:41:46	for singles okay so most of the time you would the detect just to plan
0:41:51	to agents movie nations about the we in the many occasions uh you detect a
0:41:57	human on and uh sometimes up to five so we have a distribution and equally
0:42:03	for doubles uh you have a distribution so you have two sets of this the
0:42:06	uh
0:42:08	you look on the money on the basis of distributions rather than single observations but
0:42:15	anyway so we are basically trying to differentiate between uh
0:42:21	two distributions one which is a modal distribution and one which is of the distribution
0:42:26	and look for differences and that anyway so that's uh what we have a downer
0:42:31	which is a source standard approach and here we have some uh
0:42:37	not the results but the data that we use so we have can see we
0:42:41	have five videos uh of different length so they are not necessary or complete much
0:42:47	is about the white it doesn't that they all of a different situation so we
0:42:52	have uh australian uh japan tournament and us women and men single doubles and these
0:43:03	are the numbers of the place and um
0:43:10	and here we have some results okay so what we show here body to you
0:43:18	as we are comparing distributions if you are using an into information just from one
0:43:23	short then this will give you the performance that you would get
0:43:28	for various scenarios okay and the uh basically uh here we are talking about the
0:43:36	detection of under forty so uh we train on singles and when i talk about
0:43:43	a normally i'm or there's S you mean that any training that is done is
0:43:50	or was down in the norm a normal situation there are many cases where people
0:43:54	are actually trying to synthetic pretty uh genetic on the monies create animal is and
0:44:01	the uh but i think it's fundamentally wrong approach because that if you uh design
0:44:08	a system you cannot possibly collect data or a normal situation for the idea uh
0:44:14	and well then they would just becomes of new classes and the so the really
0:44:20	this they're the appropriate the way of thinking about it is that you cannot train
0:44:24	the system only with the norm on the most data and so order training was
0:44:30	done only on singles we measured the level of noise and you can see for
0:44:35	instance that the was thirty and uh men single pay that much lower high noise
0:44:42	then uh the other two and uh and that uh
0:44:47	uh
0:44:49	has a serious implication because if you look at the data
0:44:54	you can see that the if you train or no uh so here we have
0:44:59	information okay here we trained on the uh australian women singles and japan single okay
0:45:06	so you can see that the
0:45:10	if you train on the uh good quality data and then you try to uh
0:45:17	that's the system with the data of different quality then you have problems in you
0:45:21	can see that from this guitar because this is basically the unwanted detection output or
0:45:27	the single was so we should not be detecting any animal is because the art
0:45:31	doesn't dealing with the same domain the system was trained to but uh to recognise
0:45:36	the right interpret the tennis singles and here we are actually having a problem because
0:45:43	the course of the noise condition uh we are uh detecting force anomalies uh right
0:45:50	is that when we actually use the trained on data which is a little bit
0:45:54	more noisy than that
0:45:57	not all the best uh singles throws any animal is about the uh then we
0:46:02	have to do a little bit more integration to get actually the results uh the
0:46:08	unwanted direction di can correctly so that also shows you that the uh
0:46:14	one is to be very careful about data quality and you just implications on the
0:46:18	on the money detection process
0:46:21	uh the second to the task was to well the second on the money that
0:46:27	can analyze is that the ball goes out in the time lines and
0:46:32	okay and the U
0:46:37	so the gain should terminate
0:46:40	but it doesn't just got it on and uh
0:46:44	again we have developed a so what do we have well we use
0:46:51	uh had be very careful to make sure that the a normal role in us
0:46:58	on the models out who uh situations where uh which may genuinely ambiguous and because
0:47:05	of the data in on the system itself anything very close to the boundary line
0:47:10	between the timeline and the single school was on the models but the further away
0:47:16	you got from that the remote the from the boundary line you have more confidence
0:47:19	so we have values into this a confidence measure
0:47:23	we as a filter to make sure that we are not trying to make uh
0:47:28	decisions about on the money uh on data which is by its very nature i
0:47:33	don't the ambiguous
0:47:38	coming back again to my point that we are always using only the information that
0:47:45	you acquire obtain in the local a problem but uh normal source norm and the
0:47:50	model souls and uh so basically
0:47:54	and the interpretation and the interpretation process associated with it so we have not really
0:48:00	designed the system simply to detect the specifically on the money sits do in normal
0:48:06	processing and the uh detecting on the monies as a result of that and the
0:48:12	anyway
0:48:14	this is a just um an illustration of uh of the interpretation process in the
0:48:21	so when there is a perceived
0:48:24	uh there are okay well as when the system should that i mean a and
0:48:28	actually the game continues we are uh follow in all the possible interpretations uh all
0:48:34	the possible a interpretation possible it may happen and the uh on the basis of
0:48:40	a that we are able to make a decision whether uh there is a no
0:48:44	money because the game continues uh without uh bases and the two
0:48:53	the detection is based on measuring incongruence between
0:48:57	uh contextual a non contextual uh playgirl's basically so we have our event detection which
0:49:04	is uh give you know so non contextual labels and we have the context of
0:49:08	course in which takes into account the sequences uh of events over time so as
0:49:15	uh
0:49:17	as this are normally the case you have basically as i already explained you have
0:49:21	two interpretations one which is contextual non contextual and you have to measure whether they
0:49:27	are incongruent
0:49:28	and one possible way of measuring it is using solve a bayesian surprise measure which
0:49:34	is the form of a divergence on a discrete distributions of labels about the problem
0:49:41	with that the measure is that uh it's very sensitive if you have a uh
0:49:47	a probability which moves from point ninety five one then a suddenly you move into
0:49:55	infinity and it the course this the hubble and uh so we have actually adapted
0:49:59	that mention and to use the something which was a practically a much more efficient
0:50:06	so we chose the top label the most uh the best supporting label for each
0:50:13	of the contextual or non contextual hypotheses and just measure the difference between those two
0:50:18	and you can actually show that in the two class case that we consider in
0:50:23	this particular uh application whether the ball was out not uh we uh it ended
0:50:30	up with a very simple way of measuring an incongruence between the states and when
0:50:36	we did that on the videos that we trained with also we trained on single
0:50:42	us on a single as we had no anomalies detected so no problem as you
0:50:47	would expect and then on doubles uh well with the current system whatever limitations it
0:50:54	has to be certainly detected some anomalies
0:50:58	many where undetected uh not many but is more number of false positives and then
0:51:07	you associate the anomalies with the and you have a cold where they happen
0:51:13	they identified that reminds so it was very nice and that was very easy then
0:51:19	use that association and we have another paper elsewhere uh which uh
0:51:26	and then takes the output of this uh of this module of this anomaly detection
0:51:32	module and through this association is able to but what define the rule based basically
0:51:38	say well the court remove the animal is the cold size has to change and
0:51:45	it has to use that reminds us to uh to be able to in that
0:51:50	discontent successfully so you know eight
0:51:54	i think i
0:51:56	talked about i'll give you examples of all the mechanisms that the rainy day for
0:52:02	anomaly detection and but exercise by application uh principle you need this context detection which
0:52:08	is about domain detection a rather than a uh real or complex uh for system
0:52:15	to acquire new competence and once it has then it has to be able to
0:52:19	pick out which uh domain it's to do it but in a bit and that
0:52:25	the take the appropriate knowledge base and uh this is the basic system is used
0:52:30	in the interpretation that way role of a high level and the this is the
0:52:34	anomaly detection mechanism but that's the module that uh S is still need it and
0:52:40	that would be added to the system to
0:52:44	lexus successfully so that brings me to conclusion i hope that i have a display
0:52:50	did you that uh i know what detection in machine perception requires more mechanism then
0:52:56	what is normally what is just over the body conventional model and the and what
0:53:02	these mechanisms are and how useful in practical applications thank you very much attention
0:53:40	yeah
0:53:50	the use a system
0:53:52	well i think the you know what goes into the anomaly detection system i think
0:53:57	it's genetic about the application was specific okay so obviously are solutions will not work
0:54:04	for your problem about the i think of one uh the notion of data quality
0:54:10	is very important and the also the approach the problem that one needs one should
0:54:17	be trying to train the system just with the normandy time but it's you all
0:54:25	you also mulch within yourself in the foot because if you have examples of on
0:54:28	the money then it would help you to improve the design of the nevertheless uh
0:54:33	you know system then it will be able just to detect the what you presented
0:54:38	to it you in training and uh and so there is a little bit of
0:54:42	a dynamo yeah
0:54:53	i
0:55:12	okay uh
0:55:14	basically i think in all the protocol that all the videos that the use of
0:55:18	uh from professional matches and the cameras with fixed but any okay this is why
0:55:24	we needed to do the most like uh detection with section um
0:55:32	in principle
0:55:36	at least we always use the prior information that this the ground plane so you
0:55:41	need to based on the information you can solve a calibrate the comment on expect
0:55:47	to the scene of the speech and uh so uh you doesn't have to it
0:55:52	can move in it is not the solution is not just for a single position
0:55:57	of the common uh you can always uh contrary the system for any position and
0:56:02	this is what actually happens when a remote uses them
0:56:19	yeah
0:56:32	i think it was more to do with access uh i think the video speech
0:56:38	we go around the through internet ordering to internet maybe unique go but we didn't
0:56:43	looking into it uh but we knew that uh it would be difficult to get
0:56:47	the copies of the same but on broadcast
0:56:50	although we have a one of two with that B C so
0:56:56	a game and uh yeah
0:57:10	it that uh it's not regulate and i think that would say that we have
0:57:15	maybe they are losing probably for the confidence measure we are probably losing half of
0:57:19	the ten timeline
0:57:21	uh the way we are not making decisions because uh
0:57:27	the ambiguity and i'm because we can accuracy of the system and it actually gets
0:57:31	less is for the part of the core okay because the further away from the
0:57:37	comment often a degraded in accuracy
0:57:42	information
0:58:03	well i'm i hope it will generate some other one is but uh that's an
0:58:07	interesting proposition
0:58:35	thus
0:58:41	for
0:58:44	which
0:58:47	and

Anomaly detection in machine perception systems

Invited talks

Josef Kittler