Computers versus Humans

Theo Pavlidis
Distinguished Professor Emeritus
Dept. of Computer Science

Background paper for a talk given at the November 1, 2002 Emeriti Faculty Meeting at Stony Brook University
Latest revision (minor): December 23, 2002.

Abstract

The notion of computers as competing with humans in terms of "intelligence" has been around for a long time and it becomes periodically the focus of public attention, depending on various events. Twenty years ago the cause of attention was the Japanese Fifth Generation project. In 1997 the victory of IBM's Deep Blue over Kasparov generated a fair amount of claims in the popular press that, finally, computers were outsmarting humans. Today face recognition by computer is been discussed both as a help to fight terrorism and as a threat to civil liberties. I plan to provide an overview of the underlying technologies (at a level appropriate for non-specialists) and explain why certain tasks of human intelligence remain well beyond the capacity of computers and are likely to stay that way in the foreseeable future.

Introduction

The first electronic digital computers came into use about sixty years ago, mainly to perform large-scale numerical or combinatorial computations. (Artillery tables in the U.S., code breaking in England.) They were seen by many people as "Electronic Brains" and very soon hopes (or fears) were raised that they could perform all tasks associated with human intelligence besides number crunching or symbol manipulation. Possibilities included mechanical translation, written symbol recognition, speech recognition, game playing, etc. The term Artificial Intelligence (AI) has been used to describe the research area that deals with such problems. Progress has been very slow and both the hopes and the fears have abated but now and then an event occurs that revives the interest of the public (and the funding agencies) to the field.

The victory of the chess machine Deep Blue over Kasparov in 1997 created a certain stir in the news media because of the unwarranted conclusion that a "machine" had bested a human. I will return to this topic, but it suffices to say that the Deep Blue team included an international grandmaster, i.e. a highly skilled human chess player had been involved in the programming of Deep Blue.

Face recognition by computers has received a lot of recent attention recently, both favorable (as helpful for fighting crime/terrorism) and unfavorable (invasion of privacy). I will argue that it is probably of limited, if any, use.

My purpose is to show that we have little to hope or fear from computers taking over the world by replicating human intelligence. This is not to say that computers do not have significant social impacts. For example, credit cards could not exist without computers. Since money problems connected to credit card spending contribute to the dissolution of some marriages we can probably claim that computers have contributed to an increase in the divorce rate. However I will ignore such indirect effects, significant as they might be and focus instead on the possible direct control that can be exerted by "intelligent" machines.

Because of the diversity of the intended audience (ranging from mathematicians to historians to biochemists and artists) this is not meant to be a scholarly paper. My goal is provide a bit more depth than that of the weekly magazines and daily papers.

A Quick Look at Artificial Intelligence

Very early two distinct approaches to Artificial Intelligence (AI) emerged:

1. There were research efforts to built or program a computer modeling the human brain with the hope that such a machine could be taught to perform all these tasks by example and eventually have computers that behave pretty much like humans. Such research usually goes under the names of generalized AI or strong AI.

2. Other researchers thought that a general solution was not feasible, but each specific problem should be addressed as an engineering problem that could be solved using an ordinary computer. The term "Artificial Intelligence" is rarely used for such efforts. Some of them are grouped under "Pattern Recognition" and others under the specific application name such as speech recognition, mechanical translation, etc. However, when there is a need to refer collectively to such research, the terms narrow AI or weak AI are used.

The first approach was the most appealing, at least to the people who had the money, so it received considerable funding. However nontrivial funding was also devoted to projects falling under the second approach. What are the results sixty years later? Very little, if anything has come out of the first approach. Some modest successes have resulted from the second approach, but even those fall short of the early expectations.

Hubert Dreyfus has written a good critique of the first approach. (He is the one who introduced the terms "strong AI" and "weak AI".) His work was first published in 1972 under the title "What Computers can't Do." A new edition appeared in 1992 under the title "What Computers Still can't Do" [D92]. Dreyfus presents solid philosophical and scientific arguments on why the search for (generalized or strong) AI is futile. His book also contains some interesting stories about the social dynamics of the AI researchers. While there have been several other critiques published, such as [P89], Dreyfus work is my favorite. For those who do not wish to invest the time and effort required to follow Dreyfus' arguments I recommend a recent article in Red Herring [J02] that contains a non-technical but quite informative discussion of the subject.

The idea that we can be build a machine that replicates the human brain is so absurd that it would not be worth discussing if it were not for the numerous efforts made toward that goal. The key problem has been lack of understanding of neurophysiology by some mathematicians, engineers, and computer scientists.

For example, great hopes were placed on "Connectionism," namely building a machine with numerous small and simple units, all connected to each other and that would be trained it so that the appropriate connection will be strengthened. There was even a company, called Thinking Machines, formed to build such "connection machines." However, the only customer of the company was the Department of Defense and the end of the cold war signaled also the end of the company. Why would anyone say that a connection machine a model of the brain? Because the brain consists of simple cells, all connected to each other. This may be what one learns in Biology 101, but the brain is highly structured. True, each neuron is connected to several others, but there are billions of them ([D94], [RB98]).

While this approach had some very strong proponents, it was not taken seriously by most scientists. There is a simple question about such a model that must be answered before it can be taken seriously. If you claim you have a machine modeling the human brain how would you modify it to model the brain of a dog, since a dog cannot learn to write poetry, play chess, etc? No one ever provided an answer to the question.

And there were several jokes. For example, "Natural Stupidity will beat any time Artificial Intelligence." There was also an apocryphal story at Bell Labs. At the end of a talk on a model of the brain someone from the audience got up and addressed the speaker: "Sir, this might be a model of your brain, but I can assure you it is not a model of my brain."

It is also worth remembering that there is a history of considering the latest technology as a model of the brain. Thus in the 16^th century irrigations networks (with sluice gates) were considered to be models of the brain. Early in the 20^th century that role was taken up by telephone exchanges.

The most useful AI results were useful spin-offs of the research; for example, search techniques. Still a lot of money was invested (and wasted) in AI projects. The Red Herring article [J02] mentioned earlier is addressed to venture capitalists warning them to stay away from AI ventures. It contains the interesting statement: "The main function of generalized AI in the real world has always been to create publicity and generate funding for firms that build and sell narrow AI applications."

Neural Nets

Neural nets represent a modest version of AI. The goal is not to solve all problems, but provide a method for solving all classification problems such as recognition of written symbols, speech, medical conditions, etc. It has been shown that neural nets are equivalent (as far as results are concerned) to certain nonparametric statistical classifiers [JMM96]. (They may have advantages in terms of speed though.) Such classifiers have formed the basis for statistical pattern recognition. While this methodology has a certain validity of its own, it also suffers from certain limitations having to do with the need to map the physical world into a feature space, which must then be partitioned into classes.

Example: Suppose that we want to investigate height (H) and weight (W) as predictors of heart disease. In the diagram of Figure 1 full circles represents people (from a population sample) who have had a heart attack and empty circles represent those who did not. A neural net will try to fit a separator between the two, typically the one shown in the diagram as a thick line. This is the wrong result for several reasons: to start with the normal weight is not a linear function of height but it should fall somewhere between cubic and quadratic (shown by thin lines in the diagram). Furthermore, we have incomplete information because we have ignored other factors that influence heart attacks (smoking, genetic predisposition, etc) so predictions on the basis of weight versus height only are of limited value. The weakness of the result is obvious in this case but, in general, we have little knowledge about what the proper features are.

Figure 1: Separating people who had heart attacks (dark circles) from those who did not (open circles)

We could "play it safe" and use all possible features. For example, if our subject is recognition of written symbols we can select as features the bits of the image of the symbol. A 20 by 30 matrix is usually enough, so we can select 600 features. However, when we work in a high-dimensional space there is the danger that we can construct a spurious separation surface unless the number of data points per category exceeds the number of degrees of freedom by a significant factor. The subject of extracting features from a picture has received a lot of attention, but no general systematic methodology was ever developed. See [S89] for a collection of papers on the subject containing a wide range of ideas.

Strong neural nets advocates tend to minimize the importance of feature extraction because, in principle, by selecting a large number samples we should be able to find a reliable separator. However "large enough" is often well beyond the practical capabilities of the development effort. In contrast, those in favor of focusing on feature extraction point out that if a proper mapping is made from the physical objects to the mathematical space the statistical classifier is of secondary importance. This is illustrated in Figure 2. In the top diagram the two distributions overlap significantly and there is significant difference in performance between an optimal and a sub-optimal separator. In the bottom diagram the distributions do not overlap and there is little difference in the performance of the two classifiers.

Figure 2: The importance of optimal and sub- optimal separators in distinguishing between two bell shaped distributions

There is no escaping the need to understand the nature of the problem so that we may construct the proper mathematical model and obtain an estimate of the number of samples needed to estimate the parameters of the separator. Thus statistical classifiers (or neural nets) are useful only if the structure of the problem has been understood. This led to the development of structural pattern recognition that places emphasis on understanding the structure of the problem and obtaining the appropriate mathematical form before applying any statistical techniques. Thus a labeled graph may be a better representation than an array of features.[P77]

An underlying challenge to all classification schemes is the issue of similarity, a concept that it is quite difficult to quantify. Consider, for example, the three shapes in Figure 3.

Figure 3: Illustration used to explain the difference between humanly perceived and mathematical similarity

If you ask a person which one of the three shapes does not belong with the other two, most likely the answer will be the middle shape (circle with a notch). But if we use the common integral square error (L₂ norm) to find the answer, the result will be the rightmost shape (an ellipse). A Sobolev norm will give an answer similar to that of human perception but one can construct more complex examples where that one also fails.

Non-parametric classifiers and neural nets in particular can be quite useful once the proper mapping of the physical object to a mathematical space has been made. Their main weakness is the exaggerated claims made for them on the basis of spurious biological analogues. In a certain vague way the architecture of a neural net (shown in Figure 4) suggest the architecture of the brain. Jumping from this "similarity" to suggest that the computational device is a model of the brain is as meaningful as suggesting that a table is a model of a dog because both have four legs. The name itself (instead of non-parametric statistics) indicates the emphasis on exaggeration.

Figure 4: Organization of a neural net (The nodes with the "+" symbols mean taking a weight sum of the inputs. The output of each such node is a nonlinear function of the input.)

During the last few years there have been claims that AI/neural nets are used with success in the web. Consider for example, the program that Amazon uses to guess a customer's preferences and offer suggestions. At its basis are two sets: the set B of books that are been sold and the set of customers C. For each customer c there might be a list b(c) with pointers to the books that the person has bought. Similarly, for each book b there might be a list c(b) with pointers to the customers that bought the book. In Figure 5 books and customers are linked by a line whose color depends on the customer. Either list may be used to create a weighted graph G(B) whose nodes correspond to books and the branch linking two nodes is labeled with the number of customers that bough both books. In Figure 5 the purple line connects two books because both have been bought by the same two customers. A customer graph G(C) has nodes that correspond to customers and branches weighted by the number of books that both customers bought. While theoretically the construction of such graphs is trivial there are significant practical difficulties because of the large size of the sets B and C (in excess of a million). One could use anthropomorphic terms and claim that the system "learns" by observing purchases of books by customers but that would be misleading. All the system does is updating weights of branches in a graph. Recommendations are based on identifying highly connected subgraphs. There is a need for efficient algorithms for performing such a task as well as the need for heuristics that guide parameter selection so building such a system is a challenging engineering task but it would be a gross misnomer to call it an intelligent system.

Figure 5: Inferring book categories from purchasing patterns. (Or how Amazon might construct their recommendations)

I cannot help but add a personal story. While I was a graduate student at Berkeley I did some work on neural modeling. It had a more modest goal than modeling the brain; my objective was to model the ganglion controlling the flight of a locust. One day Jerome Letvin visited the department and several graduate students made appointments to ask his opinion about their work. He said that what I was working on might be an interesting mathematical problem, but it had no relation to the nervous system. Then he added that I should keep that information away from the funding agency since I might lose my support if I was truthful. That probably explains the persistence of biological analogues.

The Scalability Issue

A major issue with non-parametric classifiers is that of scalability. Suppose we selected features and "trained" a classifier for 1000 samples from a population. What is the probability that this classifier will perform well for the whole population? We may have found a separator for two classes for the 1000 samples, but how valid is the separator for, say, ten classes and a million samples? Several systems that fared quite well in the laboratory have failed in practice for that reason.

One example is "VeggieVision" developed by IBM. The idea was to recognize vegetables (that are not labeled by bar codes) at checkout counters. It was relatively easy to build a system that could tell tomatoes apart from cucumbers or apples from oranges. It is far more difficult though to distinguish two varieties of oranges from each other to say nothing about distinguishing organically grown tomatoes from conventional ones. Supposedly the system was tested in Australia about two years ago and nothing has been heard about it since then.

Scalability is a key issue in face recognition. There have been several research projects with results that demonstrated successful face recognition. However, the population samples in such projects were relatively small, usually the members of a research laboratory and their friends. Samples were diverse. They included men and women of different races with different hairstyles and, for men, different amounts of facial hair, etc. I have never seen a study where all the subjects share the same major characteristics. For example, white blond men between the ages of 20 and 30 with long hair and beards. In addition, the subjects in such studies were cooperative. Expanding the method to a large and uncooperative population appears daunting.

Recognizing Written Symbols

Recognizing written symbols was probably the first pattern recognition problem to be studied. Hodges [H83] claims that Alan Turing himself was interested in it. He also tells of a visit of Norbert Wiener who told Turing not to waste his time with that problem because it had already been solved by the Neural Nets of McCullough and Pitts at MIT! (Incidentally, those neural nets were quite different in nature than the systems that appeared under the same name 20 years later.) This problem is usually referred to as Optical Character Recognition (OCR) and, in contrast with other AI problems, today we may venture to say that OCR is a "solved" problem. There are several good products in the market although none of the companies that produced them are doing very well financially. The reason for the lack of financial success is that by the time the products reached the market the need for OCR had been greatly diminished.

One of the original markets was archival, converting old printed text into electronic form. There was a great need for that in the 1970s, but OCR technology was not up to the task then. Therefore the conversion was done manually (usually in English speaking third world countries). By the time, good OCR technology became available (around 1990) the archival market was much smaller than before. OCR has been used with some success in postal applications by reading the city, state, and zip code. By 1990 about half of the machine printed mail was successfully recognized. Around that time the US Postal Service started funding efforts towards building a next generation of OCR scanners but the effort was curtailed because the volume of mail had been reduced because of the proliferation of fax and e-mail. The Postal Service decided that they could never write off the cost of the new generation of OCR machines. Another common application, scanning magazine pages has also been diminished because of the web.

It is not a pure coincidence that OCR products came out at the time that the need for them diminished. The algorithms used in the products of the 1990s were known much earlier but they were too complex to be implemented in an economical way with the digital technology of those times. When computer hardware became cheap enough economical implementation of sophisticated algorithms became possible. But at the same time personal computers and the internet became widespread. Collections of papers on the state of OCR and document analysis worldwide in the early 1990s (in both industry and academia) can be found in [PM92] and [BBY92].

We may gain some insight into the difficulties facing replication of human abilities in machines by examining some of the challenges that OCR had to overcome. One of them was the separation of foreground (usually black) from background (usually white). This is trivial for humans but it presents significant problems for a machine for several reasons. There may be illumination variations across the page that humans are very good at ignoring but a machine must find a way to deal with them. Another distortion occurs because of the averaging done by the optical scanning system. Thus narrow dark lines appear with the same intensity as narrow white gaps. In order to deal with this problem it was proposed to attempt recognition of written symbols directly from gray scale. However, by the time results of such efforts came to fruition ([WP93], [RP94]) OCR had lost its commercial promise and it was not cost effective for manufacturers to retool their systems.

Humans are not bothered by that ambiguity because they look at the overall shape of the letter. Not only they are not bothered by distortions in the shape of a letter, they can manage to read well, even when they fail to disambiguate a letter. This is because humans use context. (the reason why proofreading is hard.) It is easy to confirm this by trying to read a phone book without your glasses. You are likely to do much better in reading the names (where context exists) than the numbers. Also consider the two sentences:
New York State lacks proper facilities for the mentally III.
The New York Jets won Superbowl III.
The last word in each sentence is interpreted differently, even though they are identical.
All modern OCR systems use a spelling checkers/corrector. But his is not enough. Consider, for example, finding the word "dale" in the output of an OCR system. This is a valid English word, but if the scanned document is a business letter the word is almost certainly an error. The crossbar of the letter "t" can be missed so that the word "date" is converted into "dale." On the other hand if the scanned document is a poem the world "dale" might be the correct one. Therefore we need broader context than that provided by a spelling checker.

Recognizing Mathematically Defined Symbols

We may contrasts the reading of written symbols with the reading of bar codes. Bar codes are easy to read mechanically because they are mathematically well defined. Numerical values are encoded in a sequence of width ratios, something that can be measured quite easily. Not only the one-dimensional codes that are seen in the super-markets are easy to read but also the two-dimensional symbologies such as those seen on the UPS labels (Maxicode) or the ones seen on the New York drivers licenses and registration stickers (PDF417). Examples are shown in Figure 6. Besides their precise definitions such symbols also include error detection and error correction mechanisms [PSW92].

Figure 6: Example of two-dimensional symbologies

Recognizing Pictorial Patterns

The mechanical extraction of information from images goes under the names of Image Analysis, Computer Vision, Machine Vision, or Robot Vision. There is a wide range of applications where it is desirable to perform such a task: analysis of aerial or satellite imagery to extract information about resources or about construction activity; analysis of medical X-ray images for detection of tumors, analysis of images of industrial products for automatic inspection; etc, etc.

Two broad approaches can be used: 1. We make an assumption about the presence of a particular object at a particular location in the image and we attempt to confirm that hypothesis (Top-down). 2. We make no such assumption and attempt to identify objects from the physical properties of the image, such as intensity and color (Bottom-up).

The top-down approach has several advantages, but it is rarely applicable in practical situations. One interesting case where it has been used successfully is in the reading of the checks sent for payment to American Express. Because payments are supposed to be in full and the amount due is known, the number written on a check is analyzed to confirm whether it matches the amount due or not.

The bottom-up approach has wide applicability, but it faces far more challenging tasks. For example, while the separation of foreground from background can be dealt in a less than optimal way in OCR it must be done correctly in other recognition systems. It is usually called image segmentation since the input image must subdivided into regions that correspond to objects that have to be recognized. Most of the research in this area has focused on subdividing in image into regions of approximately uniform color and intensity with the unspoken assumption that such regions would correspond to objects. Unfortunately this assumption holds only for very special cases: diffuse illumination and objects that are polyhedral. A book by Nalwa [N93] provides a critical overview of the methodologies of machine vision. The lack of a solution to this problem has meant, for example, that movie colorization was a labor-intensive process, so that few, if any, of the companies involved were profitable.

Some people (including myself) hold the opinion that significant progress in image analysis can be made only by combining the bottom-up and top-down approaches [P96]. It appears that this is the way that people actually see and one can make several plausible arguments for the desirability of the approach. However, there are several difficulties in actually implementing such a system, including the need for considerable more computational resources. One successful example of such a combination has been in the reading of postal addresses. A bottom-up method is used to read the city-state-zip code (including, of course, error checking offered by the redundancy of information). Then the postal database is used to identify street addresses that are compatible with the ZIP code and that information is used to extract the street address from the envelope. See [HT98] for a collection of papers that use contextual information for document analysis and in particular one discussing the reading of bank checks [H+98].

In addition to reading text, success in machine vision have been achieved for the most part in controlled environments such as industrial inspection applications. In recent years successful applications have also been found when machine vision is used as an aid rather than as a replacement for human vision. For example, enhanced tele-operation is more effective than autonomous robots.

Face Recognition

Automatic face recognition seems an unlikely problem to be solved by computer for several reasons. It took over forty years to built acceptable quality machines that recognize written symbols; what makes us think that we can solve the much more complex problem of distinguishing human faces? Nobody amongst those responsible for displaying face recognition systems asked that, given the difficulties we have to distinguish, say, an "h" from a "b", how can we expect to achieve the far more difficult problem of distinguish one face from another? While mechanical face recognition might be possible in principle, it represents a drastic leap in technology from what has been achieved so far by machine vision. It is also likely that face recognition may not be solved regardless of general advances in the recognition of pictorial patterns. Neuroscientists ([RB98]) point out that humans have special neural circuitry for face recognition. It is well known that people have trouble recognizing differences amongst people of different races than their own. There is a simple experiment that can be used to show the complexity of human face recognition. Could you try to find out in what way the two images of Figure 7 differ? The task becomes much easier if you look at the picture right side up.

Figure 7: Illustration of one of the challenges in face recognition

It is instructive to repeat the experiment with the cat pictures of Figure 8. (Right side up version.)

Figure 8: The effect of the subject on face recognition

Not surprisingly, the results of installed face recognition systems have been dismal. An ACLU press release of May 14, 2002 stated that "interim results of a test of face- recognition surveillance technology … from Palm Beach International Airport confirm previous results showing that the technology is ineffective." The release went on to say that: "Even with recent, high quality photographs and subjects who were not trying to fool the system, the face-recognition technology was less accurate than a coin toss. Under real world conditions, Osama Bin Laden himself could easily evade a face recognition system. … It hardly takes a genius of disguise to trick this system. All a terrorist would have to do, it seems, is put on eyeglasses or turn his head a little to the side." ([ACLU]) Similar conclusions appeared in a Boston Globe article of August 5, 2002. It quotes the director of security consulting firm saying that the "technology was not ready for prime time yet.'' He added that the " systems produced a high level of false positives, requiring an airport worker to visually examine each passenger and clear him for boarding." The article goes on to say: "One of the biggest deployments of the technology has occurred in England, in the London borough of Newham. Officials there claim that the installation of 300 facial-recognition cameras in public areas has led to a reduction in crime. However, they admit that the system has yet to result in a single arrest." A recent criticism of mechanical face recognition appeared in the October 26, 2002 issue of the Economist. Ignoring the scientific evidence has resulted in a curious situation. The suppliers of the face recognition systems insist that the testers need to prove beyond reasonable doubt that their systems are faulty, instead of themselves having to prove that they are selling a valid product.

There is web site that displays results of a program for face detection, i.e. locating the face or faces in an image, a necessary step before attempting face recognition. The results are anything but impressive. Apparently the program uses a heuristic that a face is a light round area with dark spots (for eyes, nose and mouth) that causes it to miss faces that are dark and picks up other irrelevant areas. Wearing glasses seems to cause problems because it interferes with the eyes heuristic. Figure 9 shows a blatant case of erroneous results.

Figure 9: Results of the Robotics Institute, CMU program. A green rectangle is overlaid on any face detected. A major miss is evident.

Chess Playing Machines

Chess is a deterministic game, so in theory a computer could derive a winning solution mathematically by generating all possible moves. Because the number of all possible positions is truly astronomical (10¹²⁰) the amount of computation is too large even for our most powerful machines. Using even the fastest available computer today it will take billions of years to consider all possible moves and, consequently, derive a wining strategy automatically. Ordinary players look only 2-3 moves ahead, skilled players may look 10 or even 20 moves ahead. Such players can do that because they are very good at pruning, i.e. ignoring moves that are not promising. Therefore at each step they consider very few possible moves (often only one!).

Early efforts on computer chess were driven by general AI methodology and focused on imitating the human way of play. (The research efforts were justified to the funding agencies by the claim that playing winning chess was just a special case of general problem solving.) The 1982 edition of Encyclopedia Britannica devotes half a page to computer chess and quotes the dispute between former world champions Botvinnik and Euwe on whether a computer could ever beat a human in chess. A major breakthrough came in the early 1980s when Ken Thompson (the creator of Unix) decided to follow a different approach and have a computer play in a fundamentally different way than the way people play. Thompson developed a chess playing program called Belle based on a minicomputer with a hardware attachment used to generate moves very fast. He did not use any special strategies or heuristics except for a book for openings and end games. Belle defeated all other computer programs and became the world champion. Pretty soon Thompson's approach to the problem was followed by other researchers. The key was to use special chess knowledge and special purpose hardware. The computers looks ahead by considering all possible combinations (without pruning) for a limited number of moves. Belle looked ahead for six moves. An evaluation function is used to select one of the positions at the end and then the move that will lead to that position. While Thompson himself was not a strong chess player many subsequent efforts involved chess experts who worked on developing better evaluation functions.

Since computers can be made to play good chess by using a brute force approach their dominance over human players became a matter of time. Such a dominance though would have no implications about human intelligence. Chess in an exact mechanical game that becomes inexact only because of the large number of combinations, so it presents challenges to human intelligence. If a fast enough machine is built that can handle chess as an exact game it will play it in a completely different way than human players do.

Deep Blue [DBW] made extensive use of special purpose hardware (it could evaluate up to 200 million positions a second) and the expertise of Joel Benjamin, an international grandmaster (he played Kasparov to a draw in 1994) as well as that of another strong chess player, Murray Campbell. At least one member of the team has described their program as using the computer as a tool to enhance the abilities of a human player. The effort was justified within IBM because of the investigation of the special purpose hardware. Thus a computer did not defeat the human world chess champion, but another human did with the help of a computer. (The computer adds a few hundred points to a person's chess rating.) There is a page in the Deep Blue Web site [DBvK] comparing how Deep Blue and Kasparov play chess. It lists 10 points of contrast and I repeat one of them (No. 5) here:

"Garry Kasparov is able to learn and adapt very quickly from his own successes and mistakes. -- Deep Blue, as it stands today, is not a 'learning system.' It is therefore not capable of utilizing artificial intelligence to either learn from its opponent or 'think' about the current position of the chessboard."

At the time of this writing (October 2002) a chess match between a German program called Deep Fritz and the current world champion V. Kramnick had just concluded in a overall draw with each player having 4 points. Deep Fritz does not rely on special purpose hardware, so it can examine only 2.5 million positions a second. It tries to make for its slower speed (compared to Deep Blue) by using pruning. Kramnik stated that he has a lot of respect for the program and the team behind it. [DFK].

Concluding Remarks

Before we attempt to built a machine to achieve a goal we must ask ourselves whether that goal is realistic, and if so whether our effort justifies its cost. To start with, we need to check whether our goal is compatible with the laws of nature (as we know them). Unfortunately, we witness many schemes that claim to achieve goals that violate the laws of physics, such as the various perpetual motion machine schemes. While we see articles that point out the impossibility of such schemes, we do not see similar criticism on various Artificial Intelligence scheme. Part of the reason may be that the AI as well as Computer Science in general is a new field without well defined standards such as they exist in the older Physical Sciences. There are relatively few things that have been proven rigorously to be impossible [H00]. There is a larger class of problems (the class of NP complete problems) where there is a broadly shared opinion, but no proof, that no problem in this class can be solved in an efficient way (H00). Some Computer scientists look with admiration to Chemistry for resolving the issue of Cold Fusion within a year.

In spite of the lack of a rigorous theoretical basis, we can always ask some relevant questions. For example, how important is a task in terms of natural selection? Adding long lists of numbers does not seem to offer any selective advantages, so we may expect computers to outperform humans. On the other being able to distinguish an edible plant from a poisonous one has obvious significance, thus humans are going to be much better at subtle visual cues than computers. The fact that people can perform a particular task is no evidence that it can be performed by computers since we are dealing with two systems of vastly different abilities. The usual answer of purveyors of "solutions" is that "we have proprietary technology that simulates the human brain," etc. It seems that instead of raising a red flag such answers have a seductive effect on people with money to spend, to say nothing of the news media.

A major culprit for pursuing of unrealistic goals is that basic research (as opposed to applied research or development) has often been funded by "mission oriented" agencies, usually with minimal scientific oversight. But the deep roots of the problem go beyond that to the human weakness to try to obtain something for nothing. Artificial Intelligence offered the promise of solving problems without understanding them. In this sense AI shares some of the features of Cold Fusion and other instances of pathological science, as described by Langmuir in his famous talk 50 years ago [L53]. Amongst other topics, he discussed UFOs and he concluded that in spite of the plethora of simple explanations for the observed phenomena, the story of visits by aliens was too good for the news media to let go, so he correctly predicted that it would be with us for a long time. The view of computers as giant brains that are able to out-think and replace humans is about as valid as visits by extraterrestrials, but it makes too good a story for the news media to let go, so we are going to be stuck with such stories for a long time.

Acknowledgements

I want to thank Prof. Kostas Daniilidis of the Univ. of Pennsylvania for bringing to my attention the Red Herring article on AI [J02]. He and Dr. Kevin Hunter of Neomedia made several other helpful comments on the original draft. I learned about comparisons of inverted pictures of faces from a lecture by Prof. John E. Dowling of Harvard University. I should add that I take full responsibility for all opinions expressed in this paper.

Literature Cited

g- Works of general interest
c- Works on chess playing computers
d- Works specific to document analysis

g-{ACLU] American Civil Liberties Web Site

d-[BBY92] H.S. Baird, H, Bunke, and K. Yamamoto Structured Document Analysis Springer-Verlag, 1992.

g-[D94] Antonio R. Damasio Decartes' Error: Emotion, Reason, and the Human Brain, Avon Books, 1994.

c-[DBW] Deep Blue Web Site Also The people behind Deep Blue

c-[DBvK] Deep Blue vs Kasparov comparison

c-[DFK] Deep Fritz and Kramnik

g-[D92] Hubert L. Dreyfus What Computers Still Can't Do: A Critique of Artificial Reason, MIT Press, 1992.

g-[H00] David Harel Computers Ltd. what they really can't do, Oxford, 2000.

g-[H83] Andrew Hodges Alan Turing: the enigma, Touchtone, 1983, pp. 404, 411.

d-[HT98] Jonathan J. Hull and Suzanne L. Taylor Document Analysis Systems II, World Scientific, 1998.

d-[H+98] G. F. Houle et al "A Multi-Layered Corroboration-Based Check Reader," in [HT98], pp. 137-174.

g-[JMM96] A.K. Jain, J. Mao, and M. Mohiuddin "Neural Networks: A Tutorial", IEEE Computer, Vol. 29, 3, pp. 31-44, March 1996.

g-[J02] G. James "Out of their minds" Red Herring, August 23, 2002. On the web

g-[L53] I. Langmuir "Pathological Science" Colloquium at the General Electric Knolls Research Laboratory, December 18, 1953. On the web.

g-[N93] Vishvjit S. Nalwa A Guided Tour of Computer Vision, Addison-Wesley, 1993.

d-[PM92] T. Pavlidis and S. Mori (guest editors) of special issue of IEEE Proceedings on Optical Character Recognition (July 1992).

d-[PSW92] T. Pavlidis, J. Swartz, and Y. P. Wang ``Information Encoding with Two- Dimensional Bar-Codes,'' IEEE Computer Magazine, 25 (June 1992), pp. 18-28.

g-[P89] Roger Penrose The Emperor's New Mind, Oxford Univ. Press, 1989.

d-[RP94] J. Rocha and T. Pavlidis ``A Shape Analysis Model with Applications to a Character Recognition System,'' IEEE Trans. on Pattern Analysis and Machine Intelligence, 16 (April 1994), pp. 393-404.

d-[P96] T. Pavlidis ``Challenges in Document Recognition: Bottom Up and Top Down Processes,'' Proc. Intern. Conference on Pattern Recognition, Vienna, Austria, Aug. 1996, vol. D, pp. 500-504 (invited paper).

g-[RB98] V. S. Ramachandran and Sandra Blakeslee Phantoms in the Brain, William Morrow, 1998.

d-[S89] J. C. Simon, editor From Pixels to Features, North Holland, 1989.

d-[WP93] L. Wang and T. Pavlidis ``Direct Gray Scale Extraction of Features for Character Recognition'' IEEE Trans. on Pattern Analysis and Machine Intelligence, 15 (Oct. 1993), pp. 1053-1067.

theopavlidis.com

Site Map