Meeting Some of the Challenges© Copyright 2008 by T. Pavlidis This page refers to a challenge posted earlier on this side where the five images referred by the letters A-E are shown. The same images can also be found in Section 2 of the paper Limitations of Content-based Image Retrieval. It seems that finding measures that classify images A and B closer to each other than to C is possible by methods that rely on local features. Similar measures also find that D and E are close to each other than to C, but the distance of either one to A or B is comparable to their distance to C. In other words, the matching problem cannot be handled by such methods when an object that appears in two images has significantly different poses in each. I. Comparing Local HistogramsEach image is converted to gray values and it is divided into dxd squares. On each square the local histogram is classified as being single peak over a narrow zone ("flat" region), bimodal with distinct peaks ("sharp" region), or of none of the above forms ("full" region). The percentage of such regions on an image is counted and the resulting three numbers are used as a feature vector. The method fails when d is comparable to the image dimensions because the histograms are no longer local. (In this implementation the modality of the histograms was calculated approximately by using adaptive multi-thresholding.) Below are the results for some values of d.
II. Matching based on SIFT KeypointsThese tests were carried out by Ms. Jung-Eun Lee, a doctoral student at Michigan State University working under the supervision of Professor Anil Jain. The percentage is calculate from the ratio of matches to the average of the numbr of keypoints in the two images. The SIFT method is described in [1].
CommentsBoth methods have roughly similar results and both of them were "trained" on a different set of images. A and B are listed closer to each other than to C. Both show D and E close to each other but far from A. Method I finds them closer to each other than the A, B pair while the opposite is for method II. Also method I finds C and E close to each other, obviously a wrong result. On the other hand method I uses only three features while method II relies on matching hundreds of keypoints. Citations1. D. Lowe "Distinctive Image Features from scale-inavariant keypoints" Int. J. Comp. Vision, 2004. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||