0:00:13it yeah thank you very much for the and box
0:00:16uh to that would like to present a a rabbit image people approach for mobile location
0:00:20and as all know the availability of of uh gps P S limited so all the scenarios but we have
0:00:25only very few obstacles
0:00:27and also uh we would have a very low signal reception
0:00:30in a urban canyons
0:00:31a cost them out
0:00:35it do you hardly possible to have any position in like an yeah or train station with in libraries and
0:00:40so on
0:00:40but these are actually the base but we have the most interesting location base uh
0:00:45it's rather than a on relying
0:00:47do not want to rely
0:00:49a location the localisation systems
0:00:51that based on the wife for i've at these and stalls
0:00:53a require a infrastructure
0:00:55size of in bar
0:00:57rather like to use
0:00:59um images
0:01:00uh a recorded by a what device
0:01:03and to match them to a a visual record
0:01:05a like a speech you
0:01:07which would allow us to derive the pose in a very natural way to right from
0:01:11to do that
0:01:12you you'd by a content based image with and
0:01:16to match is very images
0:01:17um to the reference my
0:01:20among those uh can a base which you were the so called a feature based approach some some you on
0:01:25and consider a art
0:01:27and applying them to the task of location mission several challenges the rice
0:01:31see and uh
0:01:32a image here
0:01:33so we took a with a mobile device
0:01:35a pictures on uh
0:01:36it don't on it for
0:01:38and they are um suppose you match to reference later
0:01:41which will be point
0:01:42visually most similar reference data your
0:01:44which speech you um depicted in a lot
0:01:48you see there is a large uh baseline or
0:01:50uh between the two images
0:01:52cost but that you only have
0:01:54ask reference date or
0:01:58so the the parameters this
0:01:59approximately twelve point five
0:02:02but the um
0:02:04um very different uh different lighting conditions
0:02:07dynamic object
0:02:08like cost of de since and also a very complex we domain
0:02:12oh of a most important the we require a very low which people
0:02:16uh would just "'cause" the constantly changing use that tension
0:02:19and that's a great that he changing you of you
0:02:22so these will be very essential
0:02:23and we can achieve that by for
0:02:25extracting um
0:02:27the um
0:02:29a second features on the mobile device
0:02:31at very low calm
0:02:32this and we could use a a rotation and feature fast features
0:02:35which have been recently proposed
0:02:37and uh those require approximately twenty seven miliseconds
0:02:41for a regular frame one now um nexus one
0:02:45now that we have these features
0:02:46uh we want to transmit them to the server
0:02:49and uh uh this information of applied and them
0:02:52can can do is uh
0:02:54was that
0:02:55can do so by transmitting um only the visual but in these
0:02:59as you know a visual but is a um
0:03:02a are all
0:03:02features that is very very simple
0:03:05and and um this approach is that in the for about five times better than
0:03:09uh compressed histogram
0:03:11but to do that we need require a feature point station
0:03:14on the mobile device into visual
0:03:17a very low
0:03:20um so this is what we are talking about a a this presentation and the outline will be that be
0:03:25to use
0:03:26scott state-of-the-art
0:03:28and related work and then introduce the multiple hypothesis vocabulary tree
0:03:32provide some all on its quantisation structure
0:03:34yeah that of clustering approach and its visual worked uh weighting
0:03:38and we we compare with respect to state-of-the-art using experimental validation
0:03:42and compute the presentation of a sort a short summary
0:03:46so your a um on all be robust and to record conversation of features and visual words as essential importance
0:03:52for the uh performance of uh these i with names
0:03:56and among the most know my rhythms
0:03:58i the so called correctly k-means means
0:04:00the rick the flea trent ties the descriptor space by applying a means i with an
0:04:05at allows us to do uh to get a so-called vocabulary tree
0:04:08with the leaf nodes are the um so called visual words
0:04:11and a part of the vocabulary
0:04:13oh this approach can be efficiently improve using these so quote greedy search
0:04:17which uh considers multiple branches of the vocabulary tree
0:04:21and to find the um close as visual words to uh a a a very descriptor
0:04:26and this can be considered as some kind of
0:04:28uh but um back tracking within the vocabulary tree
0:04:32the hamming embedding on the other hand
0:04:33uh a store a strongly "'cause" some are strongly quantized and the menu use descriptors
0:04:39uh to allow for for the differentiation
0:04:41uh i on uh uh with a within one visual were
0:04:45last but these the approximate means uh generates a flat vocabulary
0:04:49um by just one time line to can means i with them
0:04:53two these billions of features
0:04:54and to cope with the computational complexity
0:04:57and they up an approximate nearest neighbor search
0:05:00using randomized at tree
0:05:02not to evaluate um these different i'm with means um
0:05:06we apply them to uh typical location which you'll task
0:05:09in the area of approximately four square kilometres
0:05:12including about five thousand pound around most
0:05:14which are is "'cause" each composed of um twelve rectified image
0:05:19they very images um have a size of a six and forty before at is uh and are represented by
0:05:25a want of features each one average
0:05:27and it's a see in a strong here we have uh these
0:05:30so um small circles representing the panorama us
0:05:33which are just in the board uh but this is between them are is about will point six meters
0:05:38and it very image is a placed right between them
0:05:41shifted D uh to the left by forty five degrees
0:05:44with an opening and you'll of about sixty degrees
0:05:47now we would like to compare the uh are related to a um of the the the state-of-the-art art
0:05:52um by precision recall measures
0:05:55i in to a recall of one
0:05:57uh we require the i ones to uh find to two close drama as other once within ten meters
0:06:03um to be retrieved
0:06:05and of uh con one correspondingly
0:06:07a precision of one is achieved if these two pound our must a first
0:06:13now if you take a look at this graph to see that we do not have a uh only have
0:06:17precision uh recall past but multiple
0:06:20since sequence consider up to five percent of the database
0:06:23where obviously the a probability that be a um contain
0:06:26or or uh also which we've the relevant can drama since high um
0:06:30uh uh if if you consider only a few samples
0:06:33well i also the precision is and no wasn't be also include
0:06:36uh i'm relevant columns of course
0:06:38so now if we compare the different approaches we see that the you directly k-means is inferior to the other
0:06:44well i but requires only sixty L two distance computations which makes it very very fast
0:06:50it can be P uh efficiently improve
0:06:53and applying to greedy search
0:06:55um to the H K M at the cost of increased very time
0:06:59so we we require while ball five at and ten L two distance computations
0:07:03to achieve this graph yeah
0:07:05the having embedding requires only one third of the computational complexity
0:07:09um but it increased memory requirements as
0:07:12uh these strongly quantized the script have to be stored on the mobile device which is to
0:07:17the H K M us but least is um
0:07:19set to perform one or ninety two L two distance computations with an eight randomized K tree
0:07:25a see um
0:07:27it's but it yeah right and spy um by the inferior triple uh chris see the H T M
0:07:32is still most suitable for or specific location recognition task mobile location recognition task
0:07:38um and is was also used in the paper proposing the coding um of features as visual uh bird in
0:07:44this C
0:07:46it requires twenty five miliseconds on a two point four you has this of you
0:07:50so i'm for need we do not have that stops if use on mobile devices
0:07:54and they are and the range of about one because that's
0:07:57so we files as to have even faster approaches and to this and we use
0:08:01the multiple hypothesis vocabulary tree
0:08:04and i we will uh go little bit into its see rolls on the on its when the station structure
0:08:09the first of voice you all know with an increase of the branching factor
0:08:13um we will improve their which retrieval performance
0:08:16however this ultimately needs
0:08:18to than your search and thus an enormous computational complexity if you just search through all possible show
0:08:25and it's we want to minimize the very time to achieve mobile location recognition
0:08:29we limit the M each T to binary decisions
0:08:32but it means that we split the uh we separate the descriptors
0:08:35along with the X or the direction of maximum variance
0:08:38which is indicated by the back to you here
0:08:42actually be split along be um separating hyperplane
0:08:45which is uh a point uh and um place at the mean of these descriptors that are within a particular
0:08:52oh there's obviously um
0:08:53the script that are very close to this type of plane
0:08:56so the probability that a matching just a very descriptor
0:08:59it's just on the other side of the separate uh a hyperplane is high
0:09:03a to a white this some biggest decisions we apply a so called overlapping buffer
0:09:08it's in separating have a fence
0:09:10there with is actually defined by the variance
0:09:13all the um data here
0:09:15now uh this in was so close pitch trees and if
0:09:19um are feature is assigned to the data base features assigned to just overlapping buffer
0:09:25then it will not need um separated will be assigned
0:09:28to both child nodes
0:09:31and this allows us to avoid
0:09:33the i'm big use decisions if uh
0:09:35features are are very close to the separating how
0:09:40so altogether to so a so um now um use that they the database descriptors
0:09:45follow multiple hypothetical past through the tree
0:09:48a a that a very descriptor could rubber
0:09:51and this makes us particularly robust against white uh variations
0:09:55it could be um could stem for instance
0:09:57um from white baselines lines
0:10:02so now that we have um now that we can recursively of I's the descriptor space
0:10:08we could and continue until a certain mix number
0:10:11of um features is reached for no
0:10:15um um i as we consider large data
0:10:17um those can result
0:10:19or a certain uh you've resulting in different sized descriptor clusters
0:10:23which could stem for instance from
0:10:25uh different of currency free pins all certain textures
0:10:28which are are as window
0:10:31not to of whites the over fitting of such descriptor clusters
0:10:35uh we want to stop the separation ones uh the descriptors is close to hypersphere sphere
0:10:40which means
0:10:41that the descriptor class is consistent in itself and no further uh separation is necessary or useful
0:10:49and efficient approximation to do that would be to take a look at the racial features
0:10:53that are signs to this over of a
0:10:56he C actually an note that can be very well separated
0:10:59a strong variance uh can be observed in this direction
0:11:02and on the other hand you have here a different note
0:11:04well a the bearings is almost the same all directions and a large a fraction of the features
0:11:09is assigned to it um overlapping phone
0:11:13and this would mean that's also more conversation steps would be required
0:11:16to just uh to separate this
0:11:18now we stop the separate uh this
0:11:20separation process once a certain fraction is
0:11:22uh included into this overlapping buffer
0:11:25so this not only avoids um the
0:11:27um over fitting effects and thus improve the retrieval performance but also we use the size of the tree
0:11:33and thus also
0:11:34the quantisation time
0:11:35sorry if time but that me
0:11:40now that we have a three um organisation structure that allows to uh cope with these uh continues the space
0:11:47we also want to integrate the probability
0:11:49um of feature front oh um but a a very and database descriptors are a assigned to the same visual
0:11:58and that's we know that matching um descriptors scriptures a follow a dimensional passion distribution
0:12:03we can say that at the probability that a feature
0:12:06is assigned to the other side of this overlapping buffer or a separating hyperplane
0:12:11corresponds to the interval go over this area of a slap action a solution
0:12:15so here we have a very feature a
0:12:18and to probability that uh a the a matching uh database descriptors you lost
0:12:23would be assigned to a different
0:12:24um node
0:12:26is corresponding to this part of the slope passion the solution
0:12:30so now we have multiple um
0:12:32uh conversation stamps and of course everyone has to be correct
0:12:36that means that we have to to find a probability
0:12:39that's uh make that that um a great a script and database descriptors signed to the same
0:12:44we shall word
0:12:45um this has to be always to say uh correct and thus
0:12:48uh the probability um is the multiplication of these individual probability
0:12:53so with these probabilities we can actually weight
0:12:56the distance calculation
0:12:58between the very few F vector and you reference you F vector
0:13:01so that means that a feature that has in more reliably quantized
0:13:05a more or has a lot hard a larger contribution to this is quite relations
0:13:10a collection and a feature that is less reliably kind
0:13:14so features uh where we are very confident about the visual words um assignment
0:13:19contribute more to the you have distance compilation
0:13:25now we want to compare our approach uh with respect to the
0:13:28um H K am i with them which was used so far
0:13:32and uh we can sum up and say that uh with these with a like wouldn't be did hardly increase
0:13:37the very time so of the um
0:13:40uh uh that class reading and all the um uh the which will what waiting adds up to the uh
0:13:46overall carried
0:13:48and it's C see here we have applied the same experiments
0:13:51as we described before
0:13:53where we have to find the uh too close as
0:13:55spun around must around every very location
0:13:58and this is the the same group as we had before
0:14:00for the H K M
0:14:01at the range of ten meters
0:14:03so the image T allows for significant improve from this respect to the a few the performance
0:14:09and this is even more significant
0:14:11if we uh uh a ask the are reasons to find the for close to upon must
0:14:16which other ones uh within twenty meters
0:14:20what most importantly we managed to uh achieve um um um um
0:14:24overall uh a great time of two point five milliseconds
0:14:28but one thousand very descriptors
0:14:29on a two point C your hz that's up C you
0:14:32and this is a ten point um a ten fold speed-up
0:14:35with respect to the edge K
0:14:39so to conclude the presentation you can say that uh we are facing the problem
0:14:42of uh feature quantisation on the mobile device
0:14:46to facilitate a mobile location recognition and we did that
0:14:49by generating a a multiple hypothesis vocabulary
0:14:52oh which allows us to cope with "'em" biggest conversations step
0:14:56and we uh use an adaptive clustering to we use the or fitting effects
0:15:00in to integrate the probability of correct feature station into the distance calculation
0:15:05as allows us all together to achieve a ten fold speed-up spec to state-of-the-art
0:15:09and this results in twelve miliseconds on for one thousand scriptures on an S one
0:15:15and a combination of the M each treaty with rotation brand fast features and tree to count coding
0:15:19and a small i i looked at um mobile will be a time location recognition
0:15:23at thirty frames per second
0:15:25so like to thank you very much for attention and if you have any questions for retrieved
0:15:45was that is pretty much the same as uh so there's that was not change right we still send uh
0:15:50the the same amount uh are you have the same number of a
0:15:53for the in C
0:15:57depends very much on how many features use send so uh i think it's in the range of um it's
0:16:03a five a i don't wanna say something wrong yeah but it's five times less a straw would require to
0:16:07send the same thus
0:16:13so it is um we did not invent here and you compression scheme
0:16:17uh that is like still this um this coding of which word in this indices
0:16:21um but um i think it's compatible able but we have no prior knowledge on the location of the mobile
0:16:27now we still have like one three that is already in a mobile device you can use this also for
0:16:32a a a mobile formal product mission than anything
0:16:35um there's no prior knowledge uh include