Speech Transcript - DATA-PATH AND MEMORY ERROR COMPENSATION TECHNIQUE FOR LOW POWER JPEG IMPLEMENTATION

0:00:15	um
0:00:16	this work was actually done by my student
0:00:18	oh you you was right now doing an internship at at uh at in i get to come to prior
0:00:23	well he's working
0:00:25	um
0:00:26	so the basic idea here was that
0:00:29	the on know that would did scaling helps in reducing though energy consumption
0:00:35	but when you want to scale aggressively you want to introduce errors into the system those errors can being the
0:00:41	data five those errors can be in the memory
0:00:44	the question is
0:00:45	can you compensate for the errors
0:00:48	um
0:00:50	all the errors that you introduced because of old two over scaling anyone to compensate for the errors
0:00:57	not by using standard E C sees or something
0:00:59	we you want to compensate
0:01:01	using algorithms specific techniques so that be the overhead would be a small so that's that's an intention
0:01:10	so um there's have a low energy codecs X
0:01:12	um on the left hand
0:01:15	if i can get to to work
0:01:17	okay so here you have a low energy low quality
0:01:20	um
0:01:21	and you could have a high energy you high quality what we were trying to seize whether be good
0:01:25	um
0:01:26	get
0:01:28	a high quality with
0:01:29	medium and it you are low energy possible
0:01:34	couple of slides with the background so the source of errors in the memories that be were looking at of
0:01:39	these were all six transistor
0:01:41	as ram a cells
0:01:43	primarily due to mediation variations mismatches of the different cells and an S from
0:01:49	and
0:01:50	other the one modelling was done with respect to random to open to fluctuation R D F on the B
0:01:55	T eight
0:01:56	um
0:01:57	so that what happens here is that
0:01:59	the threshold as a special uh i with with technology scaling that's trash more due to variation increases and that's
0:02:06	shown over here on the right with um signal V T
0:02:10	so
0:02:12	these are for
0:02:14	at that you do not a does six transistor as from cells for different what did levels you
0:02:18	instead of the norm in a a a point ninety five drop down to point seven five i can get
0:02:23	a ten minus four bit error rate
0:02:25	i become more a greedy and more aggressive and go down to about point six eight more operation
0:02:31	you can go up you can the bit error rate increase spectacular to ten minus three
0:02:36	um the most of my presentation i be operating on either ten minus four or ten minus five
0:02:42	oh ten minus three sorry
0:02:43	um so from point nine volts at three two you're dropping down to point seven five a sum of for
0:02:50	synthesis results are for forty five nine a meat is we actually um go down from in mean what to
0:02:55	do one words to about point eight
0:02:59	no i i a data plots again um let's say i used to twelve bit it
0:03:04	add or for my discrete cosine transform these are um all the transform computations are done with shifts hardwired shifts
0:03:11	and adds
0:03:12	um the model that is used to over here is as follows so um
0:03:16	the F A is the nominal delay of for for or so these are all um you got all of
0:03:21	these results by actual synthesis
0:03:23	these systematic is the delay due to a systematic variation of the order of of five because seconds of this
0:03:29	is about thirty five but um
0:03:31	oh point nine volts to one word
0:03:33	a this is a a delayed you to random variation and you have a signal F E which is the
0:03:37	really as
0:03:38	oh is a which is the variance of T your
0:03:41	no i i have if i'm looking at any one of these um
0:03:45	positions up here there all of these critical parts that goal we here in a minute average show you how
0:03:50	we calculate the probability of errors by summing up the effect of each or of for error in each of
0:03:55	these spots are but
0:03:57	got pretty the are in this position and calculate the error in this position and so on
0:04:01	so basically by have would deal a of a carry chain that starts from a one for ladder and ends
0:04:06	another for latter
0:04:07	it can be represented in this particular forms so D chain
0:04:11	it doesn't matter rate starts as long as the length of the chain is the same
0:04:15	um depends upon
0:04:17	these values which are obtained by um a simulations and then
0:04:23	i the circuit simulations
0:04:24	and then the use um values as well so
0:04:27	it so F
0:04:28	you know the deal a from one to ten is the same assumption here is that the delay from one
0:04:33	to ten
0:04:34	is the same as the delay from two to
0:04:39	a can tuning on with the data but three or models so here is a plot that of that shows
0:04:44	the
0:04:45	um error error probability function of what is level and the bit position
0:04:49	so this is forty five had a critical part of form hundred because seconds and this particular case
0:04:54	so you can see that you will get
0:04:57	errors of the word here in the most significant bit positions
0:05:01	obviously D increase significantly as my what did let woods go down
0:05:05	and in the L is these the pretty much
0:05:07	very flat which means that i could get to
0:05:10	oh
0:05:12	this is a pdf of errors for bit error rate and mine as for like a that most of that
0:05:16	that you can of focus on bad
0:05:17	um we we i had more colours here anyways the data at error was like this of this is a
0:05:23	as P positions of this is your L as E this is your M as P
0:05:26	the map T error is
0:05:28	fairly constant over here or that that you in a bit what those um
0:05:32	i up to and this is your overall error which is the it's and i'm jumping ahead of a little
0:05:37	bit were here
0:05:38	yeah
0:05:39	that came from a B
0:05:41	G peg
0:05:41	addition
0:05:42	where a um this is a two dimensional dct block if i have an error during role computations those error
0:05:49	as it propagated in the column
0:05:51	otherwise we assume that the data but errors and the memory errors are
0:05:58	um so the whole thing was implemented so you have to do it was not this is not a for
0:06:02	chip implementation of parts of those uh what implemented in hardware
0:06:08	all
0:06:11	not to be compensated for
0:06:13	yeah
0:06:14	a techniques that you look at a couple of things that want to show here
0:06:17	one is the magnitude all um coefficients with respect to the quality metric use a high you means good quality
0:06:24	um if you know is that
0:06:26	for these
0:06:27	if C coefficients are you looking at is exact scan for two D C Ds so
0:06:31	uh you have the dc coefficient next X is if you one the one that
0:06:35	that is is to and so one so you doing as exact scan were here
0:06:39	notice that many of these coefficient
0:06:41	independent of the value of you have
0:06:44	bad
0:06:45	no not need that they also
0:06:48	so
0:06:49	this is one point to note
0:06:51	second point to note is this particular lot that shows
0:06:54	a cross twenty blocks
0:06:57	um and this is a these so this is your D C this is you sixty part it's C coefficient
0:07:03	oh the magnitude of the dc coefficient value significantly
0:07:07	block to block but you can look at this whole place
0:07:10	it's almost the same there as you things that the the different so you see one more things is that
0:07:16	the C coefficients
0:07:17	representing the same frequency at neighbouring blocks have
0:07:21	a
0:07:22	so
0:07:22	they are similar
0:07:24	uh are round the same log a similar wrong
0:07:28	and the other thing of course is that the mac you'd is very small a here for the high
0:07:32	but which are that's why have the whole
0:07:34	one position matrix that
0:07:38	okay so now there are errors in the system how are you going to detect they take their
0:07:44	three things up there
0:07:45	first what we did is we can have um
0:07:48	um
0:07:49	assume that you would have it E
0:07:51	annotation
0:07:52	so i i have a higher Q that needs high quality over here so
0:07:56	for high quality group one consists of um D C two
0:08:01	if C fifteen group to consist of is C sixteen to the next um sixteen coefficients and so on so
0:08:06	that's so you broke up
0:08:07	the sixty four coefficient
0:08:09	you can see that rule for which is a high is equipped as you could do with very few number
0:08:14	of it
0:08:14	and this is something that
0:08:16	mean used in designing a wide
0:08:18	chips by others that was work and
0:08:21	um ten eleven years ago by
0:08:24	john the class and and one of fit students that a mighty where they
0:08:29	drop the number of coefficients tricks follow up word of a slightly different sorry
0:08:34	from coarse she cries group but do you where
0:08:37	because it i i i essentially actually be the number of bits at each level
0:08:42	um
0:08:43	we not doing that is assuming that to have a you know somebody else
0:08:47	chips everything is done you are looking at our
0:08:51	but be playing tricks on
0:08:54	so
0:08:54	the for
0:08:55	technique technique very very simple it detects errors that are in the sign extension bits of coefficients
0:09:01	so for instance if i were in group to
0:09:04	and um so i you could be after the quantization of only have five bits i three bits of sign
0:09:09	extent set
0:09:10	was three bit and on the same i know that is a
0:09:14	so
0:09:16	so it tech
0:09:17	and then you just you know you can pick the three bits from the sign extension bits to find the
0:09:22	correct ones if you have more
0:09:23	do more of course
0:09:24	this is applicable only to the stop
0:09:26	don't a couple of
0:09:30	the second is that but functions that are it just sent to each other have similar back
0:09:35	it could be a just in the same block
0:09:37	do frequencies are
0:09:39	um
0:09:40	same frequency to
0:09:42	so
0:09:43	essentially actually what you do what you're doing is you're just acting and error when there's an abnormal increase in
0:09:48	magnitude in one of the sessions we just search what he's
0:09:51	a couple of things uh around and
0:09:53	it yet
0:09:54	for you know that there is an
0:09:58	no that combat most of the
0:10:00	because the data file
0:10:02	a i and this particular um K is
0:10:05	but then you do have some errors for the lower order
0:10:09	uh that happens
0:10:10	during the D quantization that the
0:10:13	no you you you set things with a few number of bits but then you have to sit the uh
0:10:17	so one
0:10:18	in the decoder level
0:10:19	so here the the processing of the little bit different
0:10:22	you got check neighbours was of the car in bit and that's you you look at neighbouring blocks of the
0:10:27	same frequency
0:10:28	and same blocks neighbouring frequencies and then
0:10:31	um the hardware overhead for this is that
0:10:37	um the simulations that that be are compare be are doing everything would be it with respect to baseline chip
0:10:41	big
0:10:42	um all the performances
0:10:44	and i versus compression rate
0:10:46	and there was also i a show you hear are for these four images but since then you
0:10:51	and you know
0:10:52	the
0:10:52	same thing for for other which is that so the trend but the trends are pretty much the same
0:10:56	but i will don't each
0:10:59	for these four
0:11:03	how do better performance the to read the results that i'm would to show you hear i just for the
0:11:08	overhead part now the words we did not implement the whole
0:11:12	baseline jpeg
0:11:13	all that was implemented in hardware is just a head to detect the errors
0:11:18	um the were done use design compiler from synopsis some synthesized using forty five not a meter using and get
0:11:24	library
0:11:26	okay so the are
0:11:28	oh
0:11:29	must
0:11:31	so this is the simulation results one for the bridge image the others are very similar not summarise that for
0:11:37	you
0:11:37	so here here if you looking psnr snr bit that compression rate bits per pixel
0:11:42	so not just that the on this very what am
0:11:45	ox sorry
0:11:46	let's start from the top
0:11:47	this is your error free
0:11:49	for the bridge image it's is is how the core should have look like
0:11:53	and
0:11:54	this is if you do do their were errors in the system and you didn't do it and this is
0:11:58	you are no correction
0:11:59	so are met that's will allow you to operate up here
0:12:03	okay is so i about point seven five bits per pixel which is a you know you
0:12:08	common uh
0:12:10	a comparison point
0:12:11	you are one point five db less then
0:12:15	if you had no errors
0:12:17	and you are for D V from you for D V better
0:12:22	if you um didn't do anything
0:12:25	and how much is the power gains for this you looking at a buyout out rough estimate would i would
0:12:30	say
0:12:31	if if i assume that um everything in a baseline jpeg
0:12:35	the ones that we looking at which is D C T and quantization before the entropic order looking about sixty
0:12:41	to seventy percent of the power consumption
0:12:43	and but this drop from about one more to about point it will to getting about
0:12:49	as and reduction
0:12:50	um this is a four bit error rate ten minus three which is more aggressive so you're you have scale
0:12:56	the would it
0:12:57	significantly
0:12:58	um but you are we off compared to so you are so this is
0:13:03	the
0:13:05	algorithm specific technique that talked about your four D
0:13:08	below at point seventy five bits per pixel that your for about four db below
0:13:14	what
0:13:15	the normal error um
0:13:17	a a a rough would be and you are nine be better than if you had not done anything and
0:13:22	just in uh what
0:13:26	these are just the simple for ten minus for be kind of mostly focused on that of that then mine
0:13:31	three was we to address of but yeah but it of ten minus four point seven five bits per pixel
0:13:37	a you what to what little bit more than of four db improvement on average reduction is about one point
0:13:42	eight db for
0:13:43	the best
0:13:46	this search
0:13:48	this circuitry overhead it's actually a very small i mean this is the first the step one only needed a
0:13:53	majority of order for three bit
0:13:56	not a big D
0:13:57	second one was just a coefficient compared to which is probably one of the higher ones and this is the
0:14:03	magnitude correction unit that you needed for step three like i said this is jeff the overhead of those
0:14:10	a this is just the circuitry over head of the three methods we did not
0:14:14	and all baseline jpeg
0:14:19	so you can load so the proposed out um i've with the specific technique
0:14:23	um essentially exploit that characteristics of the quantized what versions of this is not thing that
0:14:29	will work for jpeg
0:14:31	a other work that was along the same lines for
0:14:34	other image and video X
0:14:36	a that
0:14:37	the coefficients are correlated related but it's
0:14:40	D D in be transforms in some cases
0:14:43	um
0:14:44	the that had been been using the a number of
0:14:48	i the reducing the number of bits that you require are correcting errors if there are um errors in the
0:14:53	system that are introduced
0:14:55	but there it's for a memory or but it's in the data path
0:15:00	and you know the core of the technique improves the psnr our performance like a said by about
0:15:05	four db compared to the no correction case with the degradation of less than two db compared to the error-free
0:15:12	that
0:15:13	oh my talk
0:15:14	like to some questions now
0:15:21	and questions from the you want use
0:15:25	yes
0:15:31	errors happening in these D she the mouth
0:15:33	should should be three
0:15:35	so would be possible to of the errors also in your compensation G
0:15:41	um
0:15:42	the the deal at the critical part D V is so incredibly small here
0:15:47	that that the would it does that M operating on so when these results were what that it's the same
0:15:52	operating what did so let's a ml parading what do was that
0:15:55	point eight five or point seven what's
0:15:58	um not for what is your we're scaling for sure
0:16:02	i mean there has come for something else than that would be a as in the other system as well
0:16:06	but you the critical part is the fraction of their very small units
0:16:10	we a good but there's a very small fraction so you will get
0:16:13	um you you want have those kind of errors
0:16:19	yeah yeah questions
0:16:22	okay
0:16:22	thank you prefer

DATA-PATH AND MEMORY ERROR COMPENSATION TECHNIQUE FOR LOW POWER JPEG IMPLEMENTATION

DSP Algorithm and Architecture Optimization for Hardware Implementation

Presented by: Chaitali Chakrabarti, Author(s): Yunus Emre, Chaitali Chakrabarti, Arizona State University, United States