0:00:13you
0:00:13introduction election
0:00:14so is
0:00:15don't work was the formant that now i'm we use as you have then
0:00:19but that is
0:00:20from yeah a on the
0:00:22on the risk of ones i'm said i
0:00:26so this took okay a an interesting in the large
0:00:29the base
0:00:30of media
0:00:31uh the nuns
0:00:32uh new image sell to large that the base
0:00:36day
0:00:36give a means that when you
0:00:39on in state of the your the
0:00:41scheme
0:00:42an image
0:00:43a your you present the by
0:00:45about one thousand or two thousand
0:00:47so it means that we have to on the
0:00:50to be billion
0:00:51uh descriptors
0:00:52on the the the basis get those
0:00:54oh sift descriptors we
0:00:55or
0:00:56uh of them mentioned one like
0:00:59uh if we look at a do so much
0:01:02uh we would like to
0:01:03right
0:01:04and
0:01:05a thousand i well so
0:01:06video
0:01:07for instance and trying and evaluation task
0:01:10as a
0:01:11that a two hundred hour
0:01:13on this to do or also present it by billions of what you want your
0:01:19and and some in music retrieval or uh was really is the uh
0:01:23the column down you some that
0:01:24the base
0:01:25uh on the gains all the it is one billion
0:01:31oh you can take a more concrete example of uh are
0:01:35spatial to that's like V
0:01:37uh evaluation in the copy detection test
0:01:40uh uh we have extra from do about sweet that's five billion image is
0:01:45thus
0:01:45two represents a a a a a of the database
0:01:49on on a D
0:01:50uh about uh
0:01:51one don't quite fourteen million uh we do it
0:01:55and look do is
0:01:57we want to sell some query
0:01:59uh
0:02:01zero on look whereas as or of the base
0:02:03based keep those
0:02:05which mean
0:02:06is that
0:02:06for each disk
0:02:07though
0:02:08we look at
0:02:09nearest neighbor or
0:02:10we Q
0:02:11D at that so it could and distance
0:02:13a for the so
0:02:15now if we look at uh exhaustive even now search
0:02:18discarding and talk
0:02:20uh
0:02:21we have a one for
0:02:22frame described i
0:02:24but one thousand descriptors
0:02:26we we have to make trillions of
0:02:28i mention that i
0:02:29vector vector reason
0:02:31that is in the order of ten our four
0:02:33the remote through
0:02:35so we can do it
0:02:36for single
0:02:37frame
0:02:38why we need for our powerful
0:02:40approximate to make the search research
0:02:42which are situation
0:02:44she's quite to used to avoid program
0:02:46but also so memory efficient
0:02:48okay i i wouldn't miss is or
0:02:52so uh uh such a reason as to to nice
0:02:56i to to re uh uh from speech yeah
0:02:58a a a pretty C
0:03:00it's great quite to use that to retrieve though should be actual
0:03:04you as neighbours
0:03:06just be the
0:03:07is why we make approximate search
0:03:10but also the a we use that
0:03:12and actually most or and many optimize to first protect
0:03:16for instance look at it is
0:03:18steve division
0:03:19uh which is very popular "'cause" that's some good
0:03:21so sort you corpora properties
0:03:23but it is quite a memory
0:03:24and swimming
0:03:25as it is is that you use only at ten at table
0:03:29uh you need at least four to by per vector
0:03:32or the exact
0:03:34to minimum
0:03:35and using that you have that some very good
0:03:36performance because i is no that that that at the station
0:03:40in the let's see
0:03:41uh
0:03:42a but does choice be two
0:03:44select a finite where is an which is a very good was and the terms of the actors
0:03:49right
0:03:50a state of so
0:03:51yeah is yeah agrees um
0:03:53on this is now a great
0:03:54so you you you agree
0:03:56but again the agrees reason
0:03:58cry as a lot of memory
0:04:00so yeah made in experiment
0:04:02on on need when we and vectors
0:04:04on choose about one hundred
0:04:06in me given rights
0:04:07index
0:04:08only when you don't like
0:04:11so it can approximate socially so that done uh
0:04:14you know i on two stage
0:04:17a the first stage is approximate search it set
0:04:21well you have some kind of space but training
0:04:24so this is a a H like a space of the shouldn't base and i can
0:04:29on you can
0:04:30for a given vector
0:04:32the find a set
0:04:33uh where is a vector or lie
0:04:36so you have to do this for so that the basic though
0:04:39a uh a flying
0:04:40and when you have a crappy
0:04:42you compute the
0:04:43that's key which you say
0:04:45on the you you would five
0:04:48the vectors of the same set as put control
0:04:51nearest neighbor
0:04:52so this is one as a function on in the less age
0:04:55you have sit the hard partition this
0:04:57to improve the probability
0:04:59to get a
0:04:59you
0:05:00nearest neighbor
0:05:02and then
0:05:03when you have some potential nearest neighbor but men mean as and are not a very good
0:05:08as a very far from the query vector or actually
0:05:11so that is why you speaker lee
0:05:13uh made
0:05:14a verification
0:05:15based an exact
0:05:17it two D
0:05:17this calculation
0:05:19know the are two uh a a a a is
0:05:22approximate net middle
0:05:24a good thing is that is it should nearest neighbor
0:05:27oh uh would the
0:05:29from the first stage
0:05:30you sure are that that will be wrong in good position of does exact
0:05:35a speculation
0:05:37the the problem is that for this a we don't stage
0:05:40we need to the script dolls
0:05:42on
0:05:43it means
0:05:44that we i still as to use a a huge amount of memory
0:05:49so for instance for one B vectors
0:05:51it means more than one hundred to get that we
0:05:54i so you have to but and this
0:05:56and this
0:05:56in this case the in back
0:05:58the efficiency of this scheme because
0:06:00in practice
0:06:01you have to check too many
0:06:03nee both there
0:06:07oh you cannot do it and the but it up
0:06:09to to not more than one
0:06:11vectors
0:06:12was
0:06:13what we propose a paid not is to add that the at and that if we want came is that
0:06:17based on the source coding
0:06:22be for i have to introduce a a a a a a previous work
0:06:25you may
0:06:26uh on a makes cell edge
0:06:28where we use
0:06:29um
0:06:30compression base
0:06:32approach
0:06:33that is we are going to represent
0:06:35each that base vector
0:06:37by a compressed
0:06:38presentation
0:06:39that is
0:06:40case a concise presentation
0:06:43on
0:06:44uh is
0:06:45done using a product a quantizer to have many
0:06:47prediction values
0:06:49available don't
0:06:50he a bus
0:06:51for
0:06:52a production value
0:06:54i is really
0:06:56and then to search is is seen as a distance sounds
0:06:58approximation problem
0:07:00at is
0:07:01uh instead of
0:07:03computing to just distance between X and Y
0:07:06you are going to approximate it
0:07:08base this sounds using I
0:07:10and of
0:07:11of why
0:07:12on that you have a bias can use to make so but E
0:07:15we about the by is there
0:07:17just the most
0:07:17but
0:07:19and
0:07:20on so in
0:07:21pressing put that C
0:07:23that we can make this distance
0:07:24estimation directly in the compressed domain
0:07:27do not at and compress that that to make
0:07:30it to so
0:07:31uh that
0:07:31power
0:07:33on uh you may be from yeah with
0:07:36uh you we
0:07:37uh on building this that where a a a a couldn't vectors
0:07:40on that
0:07:41to you a space
0:07:43to P place to it to distance by hand this sounds
0:07:46on
0:07:47using this scheme
0:07:48you obtain
0:07:50almost same efficiency
0:07:51the performance
0:07:53much better i can show you some
0:07:56or so we have some proved average above bounds
0:07:58sounds
0:07:59estimation people
0:08:01so now uh
0:08:03i'm looking is a on stage
0:08:05the re-ranking king stage knowing that the first stage was a compressed base
0:08:09pro
0:08:11a good proposed but use the first stage is that we had a compressed based they sink which means that
0:08:16for each that that is vector or we have an explicit a construction
0:08:21of Z stick to
0:08:23so
0:08:24instead of using the whole disk
0:08:25to for the second stage
0:08:27we are going to uh we find
0:08:30the first a construction
0:08:32or that the basic to well thank as the first
0:08:34state
0:08:36this is done by first computing still this your or a vector
0:08:39so
0:08:41it means
0:08:41this this
0:08:42a small vector
0:08:43we two
0:08:44oh this one
0:08:47and then it's spectral because we do not want to stop me because he does a i didn't mention a
0:08:51uh uh uh for C what be don't we
0:08:54eight
0:08:55it's does not quantized use a using the quantizer that at like to
0:08:59uh uh uh if it's dual uh
0:09:03so now means that
0:09:04we can approximate Y as a first approximation obtained by the first stage
0:09:09plus as a gone
0:09:11uh
0:09:11of if five don't
0:09:13the could yeah
0:09:14so that is on could be to this dual vector
0:09:16which improves the initial
0:09:18the estimate
0:09:19both as or a construction but also for the distance calculation
0:09:24and we can uh
0:09:26re
0:09:26we can
0:09:28right that
0:09:28uh between precision and memory
0:09:30by
0:09:31is the amount of by to all going to D to this
0:09:35for contains so
0:09:37so it's parameters and prior on can be eight bytes
0:09:40that
0:09:41which just a small and the number of bytes
0:09:43to "'cause" on
0:09:44so a which in not vector
0:09:47that's good that i was them which is
0:09:48quite quite simple
0:09:50so
0:09:51we yeah that that this vector
0:09:54why
0:09:55the first approximation made by the first stage
0:09:58E
0:09:59a consists in a pleasant thing it that the code that
0:10:03which can be seen in the old net clean space as Q of why
0:10:08is the first stage well bring to compress the curvy
0:10:10we suppose of possible to Y
0:10:13on sale a shot is
0:10:14oh vectors
0:10:15so once that we want to fine
0:10:18if Y to select T as a put concerned roast neighbour or
0:10:21is you are going to explicitly to construct
0:10:24this
0:10:25improve estimate
0:10:26that is we been to uh we find
0:10:29uh why by using so was really taught like this
0:10:33so we have a hats
0:10:34and then is a new distance to i will be
0:10:37are there in instead of this C
0:10:39which is a better approximation
0:10:41uh is than
0:10:42the distance between back uh
0:10:44so this is a better a use this
0:10:46for this
0:10:47to this
0:10:51so is it see some such results
0:10:53in one billion vectors
0:10:56the
0:10:57in this case we is used for the first stage
0:10:59eight a of vectors
0:11:01the first one
0:11:02that wasn't a of the the the cost
0:11:06on just use this performance is performance
0:11:09yeah i i shows the wrong of
0:11:11so one else need or when i don't get the right
0:11:14over a a large amount of query
0:11:17oh
0:11:17the probably for we no stable is one in the first position
0:11:21in the text first position in the hundred as position
0:11:25you have to sing that the wrong can be what up to one billion vectors
0:11:28to or that you can see that the first approach
0:11:31is a better
0:11:32i could be to wrong
0:11:34so the neighbor in the first thousand position but
0:11:37that's not
0:11:39so you make and we don't king using one the eight by
0:11:42for of it you as usual uh could could cause a and want to
0:11:46we set
0:11:46you get a very good improvements
0:11:48i then you can file
0:11:49i sixteen bytes
0:11:51but
0:11:52and we converge
0:11:53the using a uh one of twenty eight byte
0:11:56not perform ones there
0:11:57which means that
0:11:58the first
0:11:59a a stable always one and first position
0:12:05okay on
0:12:06uh i have say that so we don't state as a unit cost
0:12:09which is almost negligible compared to the first
0:12:13on as a clone second so
0:12:15a means just mention
0:12:17oh kind of ms so that we can increase in new stable
0:12:20uh uh we see less and one minutes again
0:12:22oh a a to two hundred means going if you want to be sure that you we gets a nearest
0:12:26stable
0:12:27one billion vector
0:12:28a very five
0:12:30uh so
0:12:31you to of the old king is that we you can see
0:12:34uh i'm not being that was a it
0:12:36time
0:12:37but is better to use a less i as the first stage
0:12:40on more of the circle because
0:12:42you will have a i efficiency
0:12:45we have that separation for the first stage
0:12:47uh but
0:12:48comparable precision or or in fact
0:12:51if an improved precision using the we don't king
0:12:54a stage based on on uh
0:12:55source couldn't or fine
0:12:58and i would like to mention the before can might talk
0:13:01that yes but then nine uh be vector that the set of one billion
0:13:06big
0:13:07on the reason we have don't this
0:13:09is because
0:13:10as many papers thus an approximate cell
0:13:12is that makes an evaluation on one billion vectors
0:13:15and one in fact uh i think i have shows is the beginning is that's actual sites get a petition
0:13:20we need to on that one big in fact on that one you
0:13:23so if you makes make three months
0:13:25these thirty days and on the here uh
0:13:27where are but the set
0:13:28and this is one for which
0:13:30uh we have uh
0:13:32we compute that
0:13:34so uh extracted and thousand prairie
0:13:37we this and because for long
0:13:39on you they completely the exact
0:13:41nearest neighbor
0:13:42that is for each we
0:13:43you have computed support supports distance is
0:13:45a vectors
0:13:47on we give
0:13:48zero on and i wrong of uh as a
0:13:51a true
0:13:52uh a a thousand hz an on the corresponding distance
0:13:55C
0:13:55case you want
0:13:56we
0:13:57but make runs
0:13:58how
0:14:00to conclude my two
0:14:01we have proposed the
0:14:03as could base you don't king approach
0:14:05that the vote using a whole skipped also you can see
0:14:08it
0:14:08a to a the memory for a comedy yeah the server
0:14:12in which improves with shades
0:14:14yeah
0:14:14a trade off between a efficiency and pretty
0:14:17for a fixed more is that
0:14:19you have a is uh the to at a point of the vector for evaluation
0:14:23of approximate search
0:14:25a not so i have to munch that we have but the method a cage uh a a nine
0:14:30a for compression based me sets that
0:14:32produce
0:14:33as a result of uh
0:14:35the and i where is an and so
0:14:37source but things that i have a mention
0:14:40a for you
0:14:47a a question
0:14:57i
0:14:57i have very short uh question how deep and is this a technique on using set per se
0:15:04so
0:15:05a a do we have to use it with this technique are uh no so we we have this this
0:15:10is this so don't the us to a the loss and we do this skip those on the back of
0:15:15all descriptors
0:15:17but and you and i think that they can be you can
0:15:20done this
0:15:25and i have a question
0:15:29i i i i i have one more question myself actually
0:15:32and
0:15:33you do this iteration with one
0:15:36and what prevents you from a to reading again and we can it for them for further and you
0:15:42given T conversations that were that one if you make the quantisation of zero very small yeah i can go
0:15:47to zero uh for so if were of course on it is a good question because of uh we
0:15:51out to optimize
0:15:52the first stage the second save on i'd use that stage can think of right
0:15:56stores and that some kind of words are sufficient to asian actually so we have that the there's two up
0:16:02and to this stage
0:16:03so that is
0:16:04would be a good a
0:16:06it's nice
0:16:07or several let us on try out many neighbours how many back to give to and yeah yeah
0:16:13but you you you get you have that
0:16:16i remember my first course and quantization and and this per some she make is that the quantization is you
0:16:21know for
0:16:22with a bin
0:16:23sorry i i remember my first course score quantization and you know the assumption you make
0:16:27it that you know that that the air the quantisation here use uniform within a bin
0:16:32um
0:16:34so that there's
0:16:35you know you know form here
0:16:37and so they can be evenly distributed i you know points in the bin
0:16:40and so i'm wondering is if you do one or more levels of quantization
0:16:45wouldn't that make
0:16:46um a very hard to quantisation because there's is a very little structure left to you know it's
0:16:50take advantage of
0:16:51yes
0:16:52uh
0:16:54i think that's true but and you at some point when you use and are very fine can ties also
0:16:58just as the the first
0:17:00the yeah
0:17:01side is correlated for the eigenvalue value
0:17:04like compression
0:17:05yes some structure
0:17:06for
0:17:07a big intensity
0:17:08when you we find as the and we when you a noise anyway
0:17:11we have the same
0:17:12uh
0:17:13a party
0:17:14i think that uh as in compression that is
0:17:16as the on which you are we code
0:17:18uh
0:17:19a most one them uh
0:17:21so you from side or
0:17:22but the program from yeah it's
0:17:25just just got but it is a problem when you have a high dimensional data set
0:17:28because all points weekly descent
0:17:30one one uniform made you yeah you you mean you a this so a to compared to
0:17:35and you could do
0:17:36but the
0:17:37if by this
0:17:39improve
0:17:43i think
0:17:45i