Limitations of Content-based Image Retrieval

© Copyright 2008 by T. Pavlidis

Appendix C: Color Histograms Provide No Semantic Information

Color Histograms have often been used for image retrieval with impressive results on published papers. However, it is easy to construct images of quite diverse scenes with similar histograms. Below are two groups of six images each with their respective color histograms. Each image of the first group (Table C1) has been derived from an image of the second group (Table C2) by an algorithm of histogram equalization that aims to transform the original so that the frequency of each color value to be approximately constant. I used a rather simple implementation for the process relying on a pseudo-random number generator to assign frequently occurring values in the original to intervals. My purpose was not to create perfectly flat histograms but only histograms that do not differ greatly from image to image, even though the semantics of the images are quite different. Histograms have nothing to do with the semantics of images; describing the phenomenon as a "semantic gap" is a gross understatement.

Table C1: Histogram Equalized Images and their color histograms (Click on an image to see it full size).

If you compare a histogram-equalized image of Table 1 above with the corresponding original image of Table 2 below you will find few semantic differences, if any, compared to the striking differences in their histograms.

Table C2: Original Images and their color histograms (Click on an image to see it full size).

While only six pairs of images are shown, the experiment can be repeated with any image unless one or more colors have intensities concentrate in very few values (for example, if the color red is absent so that the red histogram has values only at zero).
Back to the Table of Contents - Back to Section 2

Latest update June 14, 2008