Speech Transcript - DSP Embedded Smart Surveillance Sensor with Robust SWAD-based Tracker

0:00:15	the morning everybody a my name is get down to get the you know from
0:00:20	the universe the straight fine glasgow and you two percent of work title dsp embedded
0:00:26	smart surveillance sensor we propose swad based tracker uh if i speak too fast just
0:00:32	um to slow down and the other cultures of the paper a novel so you
0:00:37	under from texas instruments and prefer subjects hologram from the university of stuff like so
0:00:42	this is the outline of my presentation first of all i would give a brief
0:00:46	introduction to set up the scene and then i was state at an object it's
0:00:50	oral work and after that i will show an overview of the entire system after
0:00:56	that i will talk about in details about the been only dates within the system
0:01:02	and then i show some results uh to become our work after that i was
0:01:07	that the convolution our show some future work
0:01:11	so we just surveillance is the monitoring we need a we commoners two video cameras
0:01:16	this is convenient because we can use multiple common us to surveillance and Y data
0:01:21	and weighted wireless person it's only here we have to be able in the looking
0:01:25	at fifteen be used at the same time
0:01:28	and i think is that we can analyze we can store to be used for
0:01:31	future access
0:01:34	but there's a problem so when we have too many we use and the personally
0:01:38	sleeping so what we're gonna do this case is a suspicious individuals walking around but
0:01:44	surveillance miscellaneous is sleeping
0:01:46	so the problem is the level of tension the reaction time and the crime prevention
0:01:52	so we don't want to use the surveillance this footage for a process for a
0:01:56	trial what we want to the contraction straightaway
0:02:01	so we analytics is the semantic analysis of video data to computer systems uh using
0:02:08	image can be the processing techniques in this case we talk about smart surveillance "'cause"
0:02:14	we have different divvied you in every applies my algorithms to analyze the speed you
0:02:20	and when we have we do not be the ninety six um we want to
0:02:23	achieve is to have smart so smart sensors so we have the beginning it's embedded
0:02:29	on uh processes which are then attached to the commerce so we can create smart
0:02:34	units and we can deploy the intelligence and the edge of the network
0:02:38	so when we have multiple us a bit smart surveillance sensor like in this case
0:02:43	we can severely um we can so be a whole building um you real time
0:02:49	we don't need to send on to be just change the central a station but
0:02:53	we didn't we just need to send all their anyone information for example an object
0:02:58	in this but also the person has to be tracked
0:03:02	so the aim of this work is to create a smart surveillance sensor for tracking
0:03:07	for automatic tracking using the ptz camera type it is a common is a camera
0:03:11	can be combined to into one uh one of our some object
0:03:16	and the object is to implement this month algorithms on a dsp board to have
0:03:22	um automatically problematical controls the ptz from the board in to be able to activate
0:03:28	and deactivate the tracking algorithm from remote
0:03:32	so this is an overview of the system in the sense that you can see
0:03:36	uh the and em which is the dsp board and the ptz which is our
0:03:40	camera in when they are connected together we can process the we just streams from
0:03:45	the camera on the dsp in this case we can talk about smart so smart
0:03:49	sensor
0:03:51	so uh for either because a texas instruments dm six forty seven evm which is
0:03:57	a fixed point dsp um then there's also i than a connection between the video
0:04:03	in the because the kalman is and i think uh which can bomb gender and
0:04:08	sixty degrees in two hundred and then the need to do this
0:04:12	um the software is implemented in C uh space with the minimum pixel or system
0:04:17	uh we also have it is if we sell where on the evm so we
0:04:21	can send commands to activate and deactivate the algorithm rerun single time and more than
0:04:26	twenty five frames per second uh on the ptz we have in http server but
0:04:31	this property so we don't do anything we just send commands to try to be
0:04:35	is that
0:04:37	and this is the bead analytics uh basically uh we acquired we just three we
0:04:42	decimate and then we didn't the leave and estimate so we uh have smaller frames
0:04:46	the process and then we uh we apply our tracking algorithm um the result of
0:04:52	this uh tracking algorithm is used to control the same and then uh we can
0:04:57	send commands to the camera we can form of the target syllables the C
0:05:02	so this is a yeah why the video stream which is that it be easily
0:05:05	by C D C R the we didn't lev and we discard the chrominance components
0:05:11	of retain only the luminance component uh so the algorithms can work on a actually
0:05:16	works and gray scale images
0:05:18	and then we decimate so we have small frames
0:05:22	the tracking algorithm is based on them when matching and then we use and a
0:05:28	sum of weighted absolute differences which is similar to slot is in the C and
0:05:33	then we have another team uh rather uh updated them but uh that you really
0:05:37	a bit more details about this algorithm are given in this paper
0:05:42	so starting from the from a frame we have a region of interest ri and
0:05:47	we uh try uh to find the best match for this template ti
0:05:54	it's easy here we have the region of interest you have the time but we
0:05:58	try to find the best match the in this region of interest so this is
0:06:01	the basic concept of them but much
0:06:04	the region of interest is defined as the surrounding area around the best match so
0:06:09	in that case we have ri plus one in this is uh alright initial region
0:06:14	of interest
0:06:16	so to minimize uh to find this mismatch we minimize the swad coefficient is you
0:06:22	can see here in this one coefficient basically say sum of weighted absolute difference but
0:06:27	these um the weighting getting them
0:06:31	it's a gaussian gonna this is because we want to give more weight to those
0:06:36	to the peaks in this in the center of the target so in this course
0:06:39	and uh peak so that the edge of the of the template i belong to
0:06:43	an occluding object or in the background
0:06:48	so uh up to update the template once a fun the best match which are
0:06:52	we compute the template for the next frame so we start from the poor and
0:06:56	then but uh we had the best match and then we fuse them together using
0:07:01	uh this information which is basically an iir filter and i'll by submitting factor
0:07:09	so in this way we can incorporate changes to the from the target in the
0:07:14	time but getting on for the tracking in the next frames
0:07:19	so once we have the position of the target we can control the ptz which
0:07:23	is the common to and we do this to http requests a single H beta
0:07:27	voice to the server on the comet or you can see a common commands for
0:07:31	the ptz so basically we have uh maybe it is a common to the user
0:07:36	name and the common the common the six is the see this is six bytes
0:07:40	send um to the um to the camera in this is done from the dsp
0:07:46	on the board to the car but also the internet at work so to want
0:07:50	to control actually once all the ptz uh in save it to move up or
0:07:55	about basically we detect if uh the ten the best match is in the stop
0:08:00	originally that it up originally done that region basically the idea is if the best
0:08:05	match is and near the edge we is likely that the target is going out
0:08:10	of the field of view so we send the commander we don't the ptz either
0:08:13	to give up or down left or right so in this way we are able
0:08:17	to control the ptz import of the target
0:08:22	so these same for frames from the memory of the dsp Z you can see
0:08:27	the black box is the region of interest but at box is the target
0:08:33	uh is the best match and on top left hand side you can see that
0:08:37	there but for the current frame is you can see the target is moving
0:08:43	and at the top you can see the template is you know the of any
0:08:45	changes so we can always find the best match
0:08:51	and for is also use a good as imprecision basically we have a position given
0:08:57	by the target and the position uh you from the roundabout and we compute the
0:09:01	cuda seriously involved in the precision is standard deviation
0:09:06	at we apply the algorithm um with matlab implementations uh before sequences that do that
0:09:14	for a sequence you can see that um basically all the track system for the
0:09:20	target box the start and uh and cc the ncc is the normalized cross-correlation uh
0:09:27	they perform worse because um they are formed by the peaks as a in the
0:09:33	um i see that the edges of the time but as you can see the
0:09:36	meat the middle this is fine and that's when the person in the video um
0:09:40	uses an already space
0:09:43	a in the pants the doesn't in six you can see the normalized cross-correlation the
0:09:48	side the average so it means that they lose the target while the mean shift
0:09:52	and this one can still for the target
0:09:56	in these are the two sequences again we see that the normalized cross-correlation in the
0:10:00	side the first uh the first graph the average so again that was the target
0:10:07	one in this case we have the last example we have a lot sizzled we
0:10:11	have that the mean shift just a single target so basically the slides uh tracker
0:10:15	the swad based tracker perform but performs better than the sad ncc and the ms
0:10:20	in the sequences
0:10:23	in here we there are some but somebody got about as we can see that
0:10:26	the accuracy could be that this anybody is always lower than all the general that
0:10:30	are is not the sequences lots of the precision usually nor so this proves once
0:10:36	again that we have good performance without tracking
0:10:40	for execution time uh so this algorithm is implemented on dsp on the board and
0:10:48	them in this one block with the takes seven milliseconds that we didn't all the
0:10:54	fifty milliseconds or frames so basically is less than forty miliseconds and is much more
0:10:59	than twenty five from the segment so we had she our name which is real
0:11:03	time this efficient in this is done through intrinsics are uh C functions that implement
0:11:11	it uh that are implemented for the a particle architecture in this case we have
0:11:17	the dsp fixed point architecture so we use that meant for the subsets for the
0:11:22	ball before which work on groups of four bytes or for pizza so basically be
0:11:28	good um you one cycle one and um swipe matching block we compute we analyze
0:11:35	for peaks also be basically got train cut down the competition by four
0:11:41	yes an example here the non optimize mation of the same algorithm takes sixty three
0:11:46	male milliseconds we just nine times more
0:11:50	so this is a working example our system you can see that the board that
0:11:54	we don't the ptz the bit that we just came from the ptz goes into
0:11:58	the board the board analyses to be disagreement right before the target
0:12:03	this is that we do
0:12:06	so this is taken from a remote from a display the remote viewer
0:12:12	it's you can see that it is a common is moving to follow the target
0:12:18	as the target moves left to right
0:12:20	your clothes are far away from the camera
0:12:25	every as the label well the camera the algorithm is still able to track the
0:12:28	target control the be the set so we can always of the target in the
0:12:32	field of view
0:12:40	so you conclusion uh i presented in a dsp embedded smart surveillance sensor uh using
0:12:47	the ptz camera to uh for the target as he tried to move out of
0:12:52	the field of view of the dsp on the dm six forty seventy six point
0:12:57	uh and the target we use is the swad based tracker the results show high
0:13:02	accuracy um accuracy and precision under partial occlusion
0:13:08	so for future work we will try to think include also complete occlusion handling
0:13:14	uh C
0:13:16	take upon this paper is to avoid the just published so here you have a
0:13:21	big deal with the swad based tracker we don't occlusion handling you can see that
0:13:25	the tracker loses the target is it becomes occluded while we didn't you originally technique
0:13:34	really able to recover the target this it comes out of depression
0:13:38	so for future work we will try to implement also this feature on the board
0:13:47	so this concludes my presentation thank you for listening in a few not constantly have
0:13:52	a test i
0:14:17	right
0:14:19	uh_huh
0:14:21	at the moment and we don't use the so uh feature of the calmer so
0:14:27	yes when the target most close to the camera the at the target the size
0:14:31	of the target sure larger screen and just not we don't do it for simplicity
0:14:37	but as you can this is solved
0:14:41	basically what we updated i
0:14:43	see here though they're target the smaller so we can uh interface
0:14:50	and then to close it closer to the camera so we can uh incorporate the
0:14:53	changes of the target in of them but at the moment we don't uh i
0:14:58	just a precise the target that's another thing to do in the future
0:15:11	okay
0:15:18	right
0:15:22	right there this target is not for face tracking or any particular objects that is
0:15:29	it's a target tracking so it works is always a target they say and the
0:15:34	obvious a good texture
0:15:36	okay so you can discriminate target from the from the background so here we start
0:15:40	from the face as an example and then he moves exactly closer to the calmer
0:15:45	obviously the face is the big for the template and gets my fading mimo my
0:15:49	neck
0:15:50	so by the generous for any object is not only for france
0:16:09	yeah
0:16:11	well
0:16:17	a
0:16:19	right
0:16:21	but you mean for the for the future work on mention yes okay and in
0:16:25	that in this uh in this paper to enter with a complete occlusion basically what
0:16:32	we do is we don't update when the target was under occlusion with an update
0:16:36	the whole template at the same time but same weight but we have different weights
0:16:41	for all the pixels in the time it so when you go center occlusion we
0:16:46	don't update decide the one of the possible with it only this one
0:16:50	and eventually when you see it yep it only few pictures on the site the
0:16:55	means of the target is going to be able to discern so in the next
0:16:59	three next few frames that is occluded
0:17:02	in that case you don't update anymore and you say the target is occluded and
0:17:06	then when it comes out is you have not updated decide the occluded one when
0:17:11	it comes out on the occlusion the target is the template is preserved so again
0:17:15	you can find the best match for your started
0:17:33	yeah
0:17:35	a yeah it can be adapted for
0:17:43	yeah well this is an usual in surveillance you have three components the detection algorithm
0:17:49	the tracking algorithm then the position or something that this is only the talking a
0:17:54	good
0:17:55	for an to select the target you can either we manually we can use an
0:18:00	automatic algorithm
0:18:02	usually in surveillance systems you have a person driving the ptz
0:18:09	trying to find something and then the rest and we're not to be the set
0:18:14	on the target
0:18:16	and then why this algorithm to talk
0:18:35	she
0:18:39	right
0:18:41	okay and the template
0:18:44	this thing about it depends on the landing factor
0:18:48	yeah
0:18:50	in this case as we process but more than twenty five from the second we
0:18:55	give important way to the previous uh
0:18:58	to the previous template into the best match but you can choose the brain in
0:19:03	a real application so to who you want you want to give more weight so
0:19:08	if you wanna have a um rgc
0:19:11	you want to preserve just ten but then you would give more weight to your
0:19:14	previous time but
0:19:16	okay if you want to a docking very fast and you will give more weight
0:19:21	to the best match in the case for example you give a divorce a divorce
0:19:25	and seventy percent of the best match so you're able to incorporate the changes in
0:19:29	the ten but for

DSP Embedded Smart Surveillance Sensor with Robust SWAD-based Tracker

DSP and Hardware

Gaetano Di Caterina, Iain Hunter, John James Soraghan