i it yeah thank you very much for the and box uh to that would like to present a a rabbit image people approach for mobile location and as all know the availability of of uh gps P S limited so all the scenarios but we have only very few obstacles and also uh we would have a very low signal reception in a urban canyons a cost them out it do you hardly possible to have any position in like an yeah or train station with in libraries and so on but these are actually the base but we have the most interesting location base uh it's rather than a on relying do not want to rely a location the localisation systems that based on the wife for i've at these and stalls a require a infrastructure case size of in bar rather like to use um images uh a recorded by a what device uh and to match them to a a visual record a like a speech you which would allow us to derive the pose in a very natural way to right from to do that you you'd by a content based image with and to match is very images um to the reference my among those uh can a base which you were the so called a feature based approach some some you on and consider a art and applying them to the task of location mission several challenges the rice see and uh a image here so we took a with a mobile device a pictures on uh it don't on it for and they are um suppose you match to reference later which will be point visually most similar reference data your which speech you um depicted in a lot you see there is a large uh baseline or uh between the two images cost but that you only have ask reference date or and um so the the parameters this approximately twelve point five but the um have um very different uh different lighting conditions dynamic object like cost of de since and also a very complex we domain oh of a most important the we require a very low which people i uh would just "'cause" the constantly changing use that tension and that's a great that he changing you of you so these will be very essential and we can achieve that by for four extracting um the um a second features on the mobile device at very low calm the this and we could use a a rotation and feature fast features which have been recently proposed and uh those require approximately twenty seven miliseconds for a regular frame one now um nexus one now that we have these features uh we want to transmit them to the server and uh uh this information of applied and them can can do is uh was that four can do so by transmitting um only the visual but in these as you know a visual but is a um a are all features that is very very simple and and um this approach is that in the for about five times better than uh compressed histogram but to do that we need require a feature point station on the mobile device into visual a very low um so this is what we are talking about a a this presentation and the outline will be that be possible uh to use scott state-of-the-art and related work and then introduce the multiple hypothesis vocabulary tree provide some all on its quantisation structure yeah that of clustering approach and its visual worked uh weighting and we we compare with respect to state-of-the-art using experimental validation and compute the presentation of a sort a short summary so your a um on all be robust and to record conversation of features and visual words as essential importance for the uh performance of uh these i with names and among the most know my rhythms i the so called correctly k-means means the rick the flea trent ties the descriptor space by applying a means i with an at allows us to do uh to get a so-called vocabulary tree with the leaf nodes are the um so called visual words and a part of the vocabulary oh this approach can be efficiently improve using these so quote greedy search which uh considers multiple branches of the vocabulary tree and to find the um close as visual words to uh a a a very descriptor and this can be considered as some kind of uh but um back tracking within the vocabulary tree the hamming embedding on the other hand uh a store a strongly "'cause" some are strongly quantized and the menu use descriptors uh to allow for for the differentiation uh i on uh uh with a within one visual were last but these the approximate means uh generates a flat vocabulary um by just one time line to can means i with them two these billions of features and to cope with the computational complexity and they up an approximate nearest neighbor search using randomized at tree not to evaluate um these different i'm with means um we apply them to uh typical location which you'll task in the area of approximately four square kilometres including about five thousand pound around most which are is "'cause" each composed of um twelve rectified image they very images um have a size of a six and forty before at is uh and are represented by a want of features each one average and it's a see in a strong here we have uh these so um small circles representing the panorama us which are just in the board uh but this is between them are is about will point six meters and it very image is a placed right between them um shifted D uh to the left by forty five degrees with an opening and you'll of about sixty degrees now we would like to compare the uh are related to a um of the the the state-of-the-art art um by precision recall measures i in to a recall of one uh we require the i ones to uh find to two close drama as other once within ten meters um to be retrieved and of uh con one correspondingly a precision of one is achieved if these two pound our must a first now if you take a look at this graph to see that we do not have a uh only have signal precision uh recall past but multiple since sequence consider up to five percent of the database where obviously the a probability that be a um contain or or uh also which we've the relevant can drama since high um then uh uh if if you consider only a few samples well i also the precision is and no wasn't be also include uh i'm relevant columns of course so now if we compare the different approaches we see that the you directly k-means is inferior to the other approaches well i but requires only sixty L two distance computations which makes it very very fast it can be P uh efficiently improve by and applying to greedy search um to the H K M at the cost of increased very time so we we require while ball five at and ten L two distance computations to achieve this graph yeah the having embedding requires only one third of the computational complexity um but it increased memory requirements as uh these strongly quantized the script have to be stored on the mobile device which is to the H K M us but least is um set to perform one or ninety two L two distance computations with an eight randomized K tree a see um it's but it yeah right and spy um by the inferior triple uh chris see the H T M is still most suitable for or specific location recognition task mobile location recognition task um and is was also used in the paper proposing the coding um of features as visual uh bird in this C it requires twenty five miliseconds on a two point four you has this of you so i'm for need we do not have that stops if use on mobile devices and they are and the range of about one because that's so we files as to have even faster approaches and to this and we use the multiple hypothesis vocabulary tree and i we will uh go little bit into its see rolls on the on its when the station structure the first of voice you all know with an increase of the branching factor um we will improve their which retrieval performance however this ultimately needs to than your search and thus an enormous computational complexity if you just search through all possible show and it's we want to minimize the very time to achieve mobile location recognition we limit the M each T to binary decisions but it means that we split the uh we separate the descriptors along with the X or the direction of maximum variance which is indicated by the back to you here actually be split along be um separating hyperplane which is uh a point uh and um place at the mean of these descriptors that are within a particular no oh there's obviously um the script that are very close to this type of plane so the probability that a matching just a very descriptor it's just on the other side of the separate uh a hyperplane is high a to a white this some biggest decisions we apply a so called overlapping buffer it's in separating have a fence there with is actually defined by the variance all the um data here now uh this in was so close pitch trees and if um are feature is assigned to the data base features assigned to just overlapping buffer then it will not need um separated will be assigned to both child nodes and this allows us to avoid the i'm big use decisions if uh features are are very close to the separating how okay so altogether to so a so um now um use that they the database descriptors follow multiple hypothetical past through the tree a a that a very descriptor could rubber and this makes us particularly robust against white uh variations it could be um could stem for instance um from white baselines lines so now that we have um now that we can recursively of I's the descriptor space we could and continue until a certain mix number of um features is reached for no um um i as we consider large data um those can result or a certain uh you've resulting in different sized descriptor clusters which could stem for instance from uh different of currency free pins all certain textures which are are as window not to of whites the over fitting of such descriptor clusters uh we want to stop the separation ones uh the descriptors is close to hypersphere sphere which means that the descriptor class is consistent in itself and no further uh separation is necessary or useful and efficient approximation to do that would be to take a look at the racial features that are signs to this over of a he C actually an note that can be very well separated a strong variance uh can be observed in this direction and on the other hand you have here a different note well a the bearings is almost the same all directions and a large a fraction of the features is assigned to it um overlapping phone and this would mean that's also more conversation steps would be required to just uh to separate this now we stop the separate uh this separation process once a certain fraction is uh included into this overlapping buffer so this not only avoids um the um over fitting effects and thus improve the retrieval performance but also we use the size of the tree and thus also the quantisation time sorry if time but that me now that we have a three um organisation structure that allows to uh cope with these uh continues the space we also want to integrate the probability um of feature front oh um but a a very and database descriptors are a assigned to the same visual were and that's we know that matching um descriptors scriptures a follow a dimensional passion distribution we can say that at the probability that a feature is assigned to the other side of this overlapping buffer or a separating hyperplane corresponds to the interval go over this area of a slap action a solution so here we have a very feature a and to probability that uh a the a matching uh database descriptors you lost would be assigned to a different um node is corresponding to this part of the slope passion the solution so now we have multiple um uh conversation stamps and of course everyone has to be correct that means that we have to to find a probability that's uh make that that um a great a script and database descriptors signed to the same we shall word um this has to be always to say uh correct and thus uh the probability um is the multiplication of these individual probability so with these probabilities we can actually weight the distance calculation between the very few F vector and you reference you F vector so that means that a feature that has in more reliably quantized a more or has a lot hard a larger contribution to this is quite relations a collection and a feature that is less reliably kind so features uh where we are very confident about the visual words um assignment contribute more to the you have distance compilation now we want to compare our approach uh with respect to the um H K am i with them which was used so far and uh we can sum up and say that uh with these with a like wouldn't be did hardly increase the very time so of the um uh uh that class reading and all the um uh the which will what waiting adds up to the uh overall carried and it's C see here we have applied the same experiments as we described before where we have to find the uh too close as spun around must around every very location and this is the the same group as we had before for the H K M at the range of ten meters so the image T allows for significant improve from this respect to the a few the performance and this is even more significant if we uh uh a ask the are reasons to find the for close to upon must which other ones uh within twenty meters what most importantly we managed to uh achieve um um um um overall uh a great time of two point five milliseconds but one thousand very descriptors on a two point C your hz that's up C you and this is a ten point um a ten fold speed-up with respect to the edge K so to conclude the presentation you can say that uh we are facing the problem of uh feature quantisation on the mobile device to facilitate a mobile location recognition and we did that by generating a a multiple hypothesis vocabulary oh which allows us to cope with "'em" biggest conversations step and we uh use an adaptive clustering to we use the or fitting effects in to integrate the probability of correct feature station into the distance calculation as allows us all together to achieve a ten fold speed-up spec to state-of-the-art and this results in twelve miliseconds on for one thousand scriptures on an S one and a combination of the M each treaty with rotation brand fast features and tree to count coding and a small i i looked at um mobile will be a time location recognition at thirty frames per second so like to thank you very much for attention and if you have any questions for retrieved was that is pretty much the same as uh so there's that was not change right we still send uh the the same amount uh are you have the same number of a for the in C but depends very much on how many features use send so uh i think it's in the range of um it's it's a five a i don't wanna say something wrong yeah but it's five times less a straw would require to send the same thus so it is um we did not invent here and you compression scheme uh that is like still this um this coding of which word in this indices um but um i think it's compatible able but we have no prior knowledge on the location of the mobile device now we still have like one three that is already in a mobile device you can use this also for a a a mobile formal product mission than anything um there's no prior knowledge uh include