The Number of All Possible Meaningful or Discernible Pictures

Theo Pavlidis, © 2009

What is the number of all possible pictures that may represent a scene? In other words, pictures that will represent not just a collection of random pixels but a collection of pixels that, at least, some human observers will interpreted as representing something that could have been captured by a camera. I used for brevity the term meaningful pictures even the word "meaning" depends a lot on the observer. (Others may prefer the term valid pictures.)

This is much harder question than the number of all possible pictures of a given spatial and pixel resolution. For example,consider 32 by 32 images with 3 bits per pixel (one bit per color). The number of possible images of such size is 23072or about 10300, a truly"astronomical" number. However, most of these pictures will not be meaningful.

I will try to answer the question in a different way, by providing a lower bound. Suppose I subdivide an image in K by K square blocks and paint each block either white or black. Or, if that appears too artificial, I choose an image of bright sky and on each block either add the picture of flying bird, or leave it empty. In this way all images will appear meaningful: birds flying in the sky. Each block need be only 100 by 100 as shown in Figure 1. Thus K can be up to 10 for 1000 by 1000 images. All such images are certainly meaningful. By selecting 200 by 200 blocks and K equal to 10 we have 2000 by 2000 images, certainly well within the range produced by modern digital cameras. How many K by K pictures exist? Since each block contains or does not contain an image, the number is 2K*K. For K equal to 10, the number is 2100, greater than 1030, or a million of trillions of trillions. (A trillion is 1012.) 1030 is not only a lower bound but also a very conservative one.

We can think of different kinds of sky, different kinds of birds, scenes of the depths of see with fish swimming around, flowers in a field, and so forth, so rather than assign one of two representations in a block we could assign 100. Then the number of possible image is 100K*K . We may also chose K equal to 20, because even for 200x200 blocks we get only 4000x4000 pixels, well within the capacity of some digital cameras. This yields 10800 possible images.

1030 is a conservative lower bound to the number of all possible meaningful/valid images.
The number of meaningful/valid images may be as high as 10800.

An even harder question is to ask how such images will be discernible by humans. For example, there may be several images of foliage but most people could not tell them apart. This question can be farther refined by distinguishing between pairs of pictures that are seen as different when displayed side by side or pairs that can be differentiated from memory. In this note I will consider only the first case. I wrote a program that uses a random number generator to create K by K boards and then (randomly) changes the color of one of the squares and displays both boards side-by-side. It seems that for K equal to 6 (or less) people can easily tell the difference between the two layouts. (I have asked several people to do the task and they all do it within a second or two.) You can convince yourself by looking at Figure 2 and the links listed there. It seems that the patterns are discernible for greater values of K as well, but for the purpose of this note K equal to 6 will suffice. (I am looking only for a lower bound.)

 6x6 full size example

 6x6 full size example

 6x6 full size example
Figure 1
Figure 2

The number of such images is 236, or 26*230, or 64*109, that is 64 billions. That is certainly a conservative lower bound, by just allowing additional configurations in each square (choosing one of 10 rather than one of 2 or one in 100 as we did when looked for all possible images) we will have 10K*K possible images. Keeping K equal to 6 we find 1036.

64*109(64 billion) is a conservative lower bound to the number of all possible discernible images. The number of all discernible images could be as high as 1036, if not higher. (2)

The ability of humans to differentiate images on the basis of small details, even when recalled from memory, has been documented recently by [BKAO08], therefore the above estimates of lower bounds are not surprising. The ability of the human visual system to process very large amounts of information has been discussed by several authors both from the viewpoint of machine vision and from neuroscience. Tsotsos has presented an early analysis of these challenges [Ts87] and, while an explicit number of all possible images is not the focus of that paper, the parameters discussed in it point to an estimate of 105400, much higher than the number given in (1). A recent editorial in Nature by Mazer [Ma08] discuss the issues from the viewpoint of neuroscience.

The order of the numbers (in both (1) and (2)) pose several challenges for machine vision (inc. image retrieval) studies. First, it is very hard to find a image set that is dense in the space of all images, unless the type of images is severely restricted. Second, and most important it raises several questions about pixel based vision both from computational and neuroscience viewpoints.

Clearly, more systematic research could establish tighter bounds than those given here, but such results may be more important for psychology or neuroscience than technology. Whether the number of all possible images is 1030 or 103000 may be of interest to pure science, but either number implies that it is impossible to conduct research that relies on "knowing" all possible images. The bound of (2) is more than thousand fold than the size of image samples used even in the more ambitious image retrievals studies. (See, for example, the
special section titled "Real-World Image Annotation and Retrieval" of he November 2008 issue of IEEE Trans. on PAMI.)

Acknowledgements: I want to thank Prof. G. Zelinsky (Dept. of Psychology, Stony Brook University) for helpful comments on an earlier draft and for pointing out Ref. [BKAO08].

Literature Cited

[BKAO08] T. F. Brady, T. Konkle, G. A. Alvarez, and A. Oliva "Visual long-term memory has a massive storage capacity for object details", PNAS, vol. 105, No. 38 (Sept. 23, 2008).

[Ma08] J. A. Mazer "So many pixels, so little time" Nature Neuroscience, 11 (Nov. 2008), pp. 1243-1244.

[Ts87] J. K. Tsotsos "A 'Complexity Level' Analysis of Vision" ICCV, 1987.

Draft first posted March 14, 2008 (major revision 3/22/08). Marked as "old" draft May 7, 2009.

Back to CBIR IndexBack to Technical IndexBack to the New Draft Site Map