0:00:15a good afternoon ever about this a try to be as brief as possible so
0:00:19this is the type of the presentation my name and the name of my collaborators
0:00:24contributed some parts of this presentation so let's start from the outline so basically i'll
0:00:31be talking about certain problems of an image retrieval using that the cup in general
0:00:38the concept of keypoints and the then i highlight some disadvantages of typical keypoint based
0:00:45approaches that new features based on keypoints as well introduced and then a brief overview
0:00:52gradually build several algorithms combining those two those new features with traditional sift descriptors and
0:01:00of course everything will be supported by some preliminary experimental verification so basically the problem
0:01:08of detection of near duplicate image fragments are image patches is quite straightforward if we
0:01:14have two images by not be interested in their in determining whether they are similar
0:01:21or nearly identical images but we might be interested whether those images contain any unspecified
0:01:28and an unspecified number similar fragments similar patches so these are examples of images which
0:01:34are generally different but which contain some similarly looking fragments this these similarities are usually
0:01:41represented by a transformation affine transformation uh denote a generally accepted model so another example
0:01:51of completely different images which nevertheless contain some fragments those patches are also to but
0:01:57not always represented just a similar or identical objects placed on different backgrounds
0:02:05and in general we expect the results something like that i mean we have those
0:02:10buttons all kinds of the errors and then we a we have we expect somehow
0:02:16i'm showing that these are identical fragments identical need nearly identical patches even though the
0:02:21images are different or another example of those two fragments are similar are those two
0:02:26but are similar accidentally be represent more of the same object or another example the
0:02:32same idea and one more over here a these images are different but they share
0:02:38some nearly identical similar patches so of course uh the concept of may keypoint-based matching
0:02:45case a very well known as very well established at the very powerful tool but
0:02:51in the context also the detection of near duplicate patches near duplicate fragments it has
0:02:58several basically two major disadvantages the first one is that individual keypoint matches even if
0:03:05they correctly represent local similarity is a very seldom correct in the context in the
0:03:11wider contacts any local or global complex and these are just a very simple example
0:03:17shown in two images which are though different and it contains a small fragment and
0:03:22these are a keypoint matches if we use the most restrictive technique one-to-one keypoint correspondences
0:03:29based on the mutual nearest neighbor values exhibitions and harris affine keypoints so as you
0:03:36see it's high it's very difficult to identify which fragment activity you're nearly similar or
0:03:41nearly identical or another fragment which is even more confusing because now we are using
0:03:48many-to-many correspondence is based on the same sift words and their work well the number
0:03:54of correspondences if you order and some of them actually corresponds to a similar fragments
0:03:59existing these two images so it is have some statistical examples of the small data
0:04:06because we're using for experiments shopping of data if we have images which contain just
0:04:12fragments which are similar the percentage of correct matches of all the matches between individual
0:04:18keypoints are very small actually a it renders all these methods use less in terms
0:04:25of bit direct to use it of keypoint correspondences as a tool for that nearly
0:04:30near duplicate fragments near duplicate patches so what we generally do we is supported this
0:04:37technique with some second step algorithm so which are jointly called do you the constraint
0:04:44can agree configuration constraints verification we can use different techniques like to run start which
0:04:50doesn't work you for their too few correct correspondences some variance of runs like prosodic
0:04:56geometric hashing hough transform the large vocabulary somewhere that identical words are used as seeds
0:05:03for the future matching and so on and so one but into a all these
0:05:07techniques require these well i'd say post processing when we matching pair of images so
0:05:12which means they are a certain limits of the size of the database to be
0:05:17such techniques can be used so the
0:05:21but that's why that's why we have proposed a new features which are still based
0:05:28on the keypoints but using these features we can achieve much higher the uh i
0:05:35mean the reliability of the techniques which are based on the straightforward feature matching no
0:05:41other verification of the geometry of a pulse trains and so on and so on
0:05:46so basically these features are term these features are named term features in turn comes
0:05:53from the tuples of ellipses represented by moments will be later explain why we use
0:06:00these names there is
0:06:02some similar right yes in the past so these are the in my opinion the
0:06:06best two examples of such ideas which nevertheless were not fully developed in the in
0:06:11the direction but uh
0:06:15yeah the features we propose are based on the
0:06:19triplets or supplementarily pairs of keypoints in this and since the that we propose descriptors
0:06:27for those features that you could have descriptors we can convert descriptors into the vocabularies
0:06:32of the visual words so in the future we can just imagine those features like
0:06:36matching the descriptor for by matching visual words and of course not because we assume
0:06:42that this might be distorted by affine transformations are using a affine invariant detectors which
0:06:48a return elliptical keypoints as our starting point actually we are using two categories of
0:06:56it detectors we are using a harris affine and we are using and seven very
0:07:01popular so starting from the term three features so if we have a and alex
0:07:08which could be a key point and an external point P one we can actually
0:07:12find these two points are a zero one eight one zero one B based on
0:07:18the direction of the tandem flight and intersection point
0:07:25is there any
0:07:26i mean i would be i and then you create so we have these two
0:07:31points are zero one a can be uniquely defined by the ellipse and by this
0:07:36point P one if we have another point we have another pair of such points
0:07:42he uniquely defined by this external point and by the al it set so that
0:07:48we can actually set for such points a if we're a two other external point
0:07:54are available for a given talent and then by using two of these four points
0:07:59and by using the point where the corresponding line intersects the objects we can you
0:08:05need to create an
0:08:07trapezoid which is defined by the L it's itself and the by these two external
0:08:12points in other words the geometry of this trapezoid is defined by the ellipse itself
0:08:18in the context of to external points is uniquely defined so then of course fortunately
0:08:25the clean out about this is quite obvious assumption so then if we have a
0:08:30if we have three ellipses for each of them if we can see that the
0:08:34centers of the other two and next are not points we can build such a
0:08:39trapezoid and these three trapezoids are considered term three feature so this is the feature
0:08:44which represents these three ellipses but in the context of the other two ellipses of
0:08:50course these points are sort of these features are change and the geometry of these
0:08:57features changes will corresponding the geometry of that of the additive so if we have
0:09:02a transformation linear or a affine transformation of that of the area contained in these
0:09:08three ellipses the shapes of these trapezoids change the change corresponding in other words we
0:09:16can relatively easily identified three ellipses which are the which are transformed or you will
0:09:23buy it fine madden but by an affine mapping just by using and the shape
0:09:28descriptors which are you invariant under such transformations and be set you are of course
0:09:33that these trapezoid it's so basically
0:09:38this is that on three feature and the term two feature which is based only
0:09:43on two ellipses again we use is three quadrilaterals which are uniquely defined by these
0:09:50three ellipses and again they change it they change covariantly with the ellipses you the
0:09:56edges are in map by from affine transformation but as i said these features have
0:10:02only uh are used only in the supplemental role in the future okay then
0:10:09we define the descriptors for these features and because the shapes are very simple we
0:10:13are also using a very simple descriptor the descriptor to giles that these complex affine
0:10:19moment invariants which is even or their over that are based on a central moments
0:10:25of them of the second order it divided by the central by the moment of
0:10:30the zero holding by the area and then as a descriptor also that all this
0:10:36teacher we use that seven values of this invariant which are gradually calculated for the
0:10:42first of these shapes for the second one for the third one for two of
0:10:47them another pair of them for the third rbm and finally for all of them
0:10:53is a very simple seven dimensional descriptor actually it for that three-dimensional a descriptor of
0:11:01term two features of our or also using just the just this battery and calculated
0:11:06for those three contribute be quite a lot less then it turns out that
0:11:14these
0:11:16features are obviously very numerous in a typical image if we if we just take
0:11:21any triplet or any pair of keypoints effective in an image so we might get
0:11:26millions of features within a single image which is of course unacceptable so we deliberately
0:11:32made the number of these features by using two steps the first step is of
0:11:36course to reduce the number of keypoints themselves so we somehow and we tried decided
0:11:42to limit the number of keypoints that are three hundred although using somehow the best
0:11:47three hundred the most prominent three hundred keypoints the and then we are building of
0:11:53those features using only a certain neighborhood of those of those not more than three
0:11:58hundred keypoints and we are using we are using it is the neighbourhoods we do
0:12:04not just traditional geometric make the records about the neighbourhood are determined by the ellipses
0:12:08themselves in other words we assume that either two ellipses which are very close and
0:12:14their size is uh a very large compared to the distance most probably be represented
0:12:20more or less the same visual contents so we reject something like that in the
0:12:24ellipses are small compared to the distance between them most probably these visual content are
0:12:29not so related or rather unrelated so we also don't really consider such keypoints in
0:12:34other words we use all the ellipses or keypoints which are within a certain distance
0:12:39and that this those is determined by the size of the minor and major axis
0:12:44of both and it's as though it should not more than half of the longer
0:12:49minor axis and the not more than two of the short verify off all the
0:12:55major axis so that would be arbitrarily is defined slightly different about this is the
0:13:01idea and other these the sounds and additionally we exclude triangles of keypoints which are
0:13:06just too narrow if one of the angles is less than fifteen degrees so under
0:13:11such establishes we get reasonable numbers of keypoint is sort of features and again these
0:13:17are just a statistical i results from our database all a with the images where
0:13:24the number of keypoints is limited to more or less three hundred we actually have
0:13:30tolerable large numbers all goes to it term two and term three features up to
0:13:36sixteen thousand which is which is not that bad
0:13:40and that the important point in using these features is that is statistically independent of
0:13:47the individual dimensions we use over thirty million term features from over one thousand images
0:13:55and we found that the correlation between individual dimensions is very no it's practically negligible
0:14:02so which means that we assume that those dimensions are just independent and then we
0:14:08build the visual vocabulary just by
0:14:13dividing each dimension into areas of the same probabilities two three four five and then
0:14:20the number of for each word is just corresponding to the cartesian product of such
0:14:25a decomposition so which means that we have just to the power of seven three
0:14:29to the power of seven for power of seven and so all and so on
0:14:33works for T three description of term three features or two to the power of
0:14:38three four and so all the four C two description so basically that is a
0:14:44that mathematical background and now the algorithms and the evaluation we have used in a
0:14:52relatively small database which is available over there but the beauty of the database is
0:14:58that it contains newly duplicate images and all the images are manually outlined and which
0:15:06means that we can uh automatically determine whether two features or to keypoints are properly
0:15:13matched of course there are some mistakes because it might be that they are matching
0:15:18not the same fragment of all of the same object or there should match similarly
0:15:23looking fragment of different objects and the way this is a pretty to be a
0:15:28reliable verification reliable grounds tools and then
0:15:33we have been trying we will be trying to a three
0:15:38to measure the performances using both traditional keypoints and goals you term two and three
0:15:46there are three features in the foreseeable to which was used a as a representation
0:15:52for individual is used as a small vocabulary of five hundred words may be too
0:15:58small but it and it turns out that it's not really that bad and
0:16:05and these are the results it's a first we just matched individual keypoint using one-to-one
0:16:11no term to know term three then a again individual keypoints using just a many-to-many
0:16:19the same word then methods were just individual term two and term three features are
0:16:25matched using a than using either a one-to-one or many-to-many
0:16:33and it turns out that all these results are using this
0:16:39the number of correct matches when we use individual keypoints are we use individual term
0:16:44for term three features are basically at the level of statistical error there's no there's
0:16:49no use of such a big how weather easily combine term two or three we
0:16:57keypoints it keypoint description which we is that we matched features geometrically term parental three
0:17:04only if they contribute the keypoints have the same words the results are much better
0:17:09and
0:17:11these are exemplar is not results for this database is used that are very good
0:17:16the problem is that the number of the matches is very small compared to the
0:17:21number of uh compared to the to the number of matching images because they're over
0:17:26five hundred images which contain similar fragments but altogether we have around one thousand five
0:17:32hundred two thousand keypoint found in there we use are very features found in the
0:17:38whole database and it turns out that all of them are concentrated in just fifteen
0:17:43twenty images so which means that the method is quite good but at its purest
0:17:49it's all that is that of all we used to just words for both terrible
0:17:55features there are three in particular and then works for the contributing sets then the
0:18:00results are better in terms of the number of sound correspondences about the percentage of
0:18:07correct matches is not so good it's too much better than previously it's about twenty
0:18:11percent but it's still
0:18:14not enough and eventuality the ultimate method combines
0:18:19both were four and three which me is sort of forty three term three which
0:18:24means we take triplets with five we can we match works for T three then
0:18:29we match were for their tool for each pair of keypoints within the triplet and
0:18:38finally we must we match also a individual sift works for the comparability a report
0:18:46the contributing keypoint and then especially if we use harris having a the results are
0:18:52quite good uniform so are we still have not too many of them but for
0:18:56harris affine they are quite many of them and the current level of phone correctness
0:19:00is very high ninety three percent of the reconstructed of the retrieved features are matched
0:19:06properly which means they actually belong to the same patch similarly looking past of the
0:19:13same object it is also and then recall the results are also not that bad
0:19:19the effective recall is about eighty percent a which is also not that bad especially
0:19:24that some of these objects are really learn so there are not many keypoint there
0:19:29in
0:19:31that's it these are just exemplary results returned by a that it by this method
0:19:37of course to the ellipses that show the outline because the individual keypoints are not
0:19:42so well as seen that there is you see the results are really quite board
0:19:46and all of them are achieved by a simple match of individual features no verification
0:19:52of configuration constraints not be like that just matching individual features
0:19:59yeah that some of these are from the database it somehow from this and is
0:20:05a data base um are just from i don't databases maybe a available databases have
0:20:11been used that also some semantically incorrect results but if you look at about eight
0:20:17dollars way that that's right
0:20:20yeah or in there
0:20:22but this example so that basically eight and so summary these advantages
0:20:31and disadvantages we still
0:20:35have to reduce the number of keypoints are not perfectly yeah having the battery on
0:20:40that was one of the conclusions and eventually we have to as an ultimate solution
0:20:47you want to incorporate very similar data into descriptions of individual keypoint so which means
0:20:53that we can identify similarly looking fragments of images but don't by matching descriptions of
0:20:59three calculated descriptions of individual keypoints rather than features comparing so we keep that said
0:21:07thank you very much sorta for being played for one minute
0:21:12okay
0:21:16well as i would have to be fast
0:21:39yes
0:21:42no not that you of the two older
0:21:45yes oh yeah
0:21:49no but it because basically they're all the three of them so we cannot incorporate
0:21:53more and the identity is that you each of them contain is represented by individual
0:22:00words so which is that we can just matt sentences of a few words so
0:22:05that's why this is the number of data we combine three of them
0:22:11yeah
0:22:13yes
0:22:18yes
0:22:21i
0:22:26yeah well basically i'm not saying that i'm identifying the outline of the patch i
0:22:31and identify and detecting the presence of the patch so even if it's a single
0:22:35keypoint or a single feature it shows that there is a patch there and it
0:22:41generally is the outline of the patch it's a different story but
0:22:47now we can we can at least know that there is a patch there with
0:22:53a high level of confidence
0:23:05uh_huh
0:23:12that
0:23:27yeah
0:23:29i fully agree with you the problem is that if you don't if you're just
0:23:33looking in that unknown unpredictable database for some preliminary similar is you have to start
0:23:39from the method which doesn't assume and think about the content of the
0:23:43and this is basically the most perfect
0:23:53sorry if the images containing
0:23:56yeah it's not a problem then we have a lot of the that was one
0:24:00example of two bottles of beer over here and two bits are two kinds of
0:24:04your over europe to cancel it be over there so basically we found that for
0:24:09pairs of patterns there's just this and that so which means that in that case
0:24:15we might have a lot of pairs of patches but eventually it's almost all that
0:24:20is it a very good part of their to either two men the repetitions of
0:24:24the pattern of course it doesn't make too much sense about you don't of two
0:24:27or three or four of them works there was no problem
0:24:39okay well i didn't have that i didn't have time to mention that it one
0:24:43basically depends on the application so it is very naive matlab bodies a implementation we
0:24:50are able to match hundreds of images within individual second
0:24:55yeah of course the random images so we have not completed that in a starving
0:25:02on and then C plus implementation and or more sophisticated techniques of a native a
0:25:10organising the database a but i did you can be really is
0:25:14i would be and the meeting
0:25:19this