Date: Mon, 07 Oct 1996 04:36:11 +0100
From: hlr@well.com
http://www.well.com/user/hlr/texts/tft6.html
Tools For Thought: The People and Ideas of the Next Computer Revolution
Howard Rheingold
First published by Simon & Schuster, 1985. Copyright Howard Rheingold, 1985. This book is out of print; all rights have reverted to the author. Feel free.
Chapter Six: Inside Information
His unicycle skills notwithstanding, Claude Shannon has been more flamboyant but no less brilliant than his elder colleagues. Rather than advertising his own genius like Wiener, or blitzing the world of science with salvo after salvo of landmark findings like von Neumann, Claude Shannon has published unprolifically, and he spends more time attempting to diminish rather than embellish the mythology that grew up around his infrequent but monumental contributions. A modest man, perhaps, but hardly a timid one: when Shannon has something to publish, it usually changes the world.
Claude Shannon was a bona fide prodigy, twenty-two years old when he published (in 1937) the famous MIT master's thesis that linked electrical circuitry to logical formalisms. He was the peer of pioneers like Turing, Wiener, and von Neumann, the teacher of the first generation of artificial intelligence explorers like John McCarthy and Marvin Minskey, and the mentor of Ivan Sutherland, who has been one of the most important contemporary infonaut-architects. When Shannon's papers establishing information theory were published in 1948, he was thirty-two. The impact on science of this man's career was incalculable for these two contributions alone, but he also wrote a pioneering article on the artificial intelligence question of game-playing machines, published in 1950. In 1953, at about the same time von Neumann and Turing were both thinking about the mathematical possibilities of self-reproducing machinery, Shannon published another major work on the subject of these special automata.
In 1956, at the age of forty, Shannon was one of the organizers of the conference at Dartmouth that gave birth to the field of artificial intelligence. From the prewar discoveries that scooped Wiener and von Neumann, to the explorations in the 1950s that led to both AI and multiaccess computer systems, his life and ideas formed the single most important bridge between the wartime origins of cybernetics and digital computers and the present age of artificial intelligence and personal computing.
What Shannon did in 1937 was to provide a way to design machines based on the logical algebra described a century before by George Boole. Boole, in The Laws of Thought, stated that he had succeeded in connecting the process of human reason to the precise symbolic power of mathematics. There were only two values in the logical calculation system that Boole proposed: 1 and 0. If a value is true, it can be designated by the symbol 1; and if it is false, the symbol 0 can be used. In this system, a truth table describes the various possible logical states of a system. Given an input state, a truth table for a specific operation determines the appropriate output state for whenever that operation is applied to that input. Another way of saying that would be that given a starting tape, the truth table determines what the ending tape will be.
In Boolean Algebra, one fundamental logical operation is not, an operation that reverses the input, so that the output of a "not" operation is the opposite of the input (remember that there are only two symbols or states). Another fundamental operation is and, which dictates that the output is true (or "on" or "1") if and only if every one of the several inputs are also true ("on," "1"). For example, the listing in the table for "A is true and B is true" would be set for "1" when A is "1" and B is "1" and set for "0" in all other cases. One could look up the answer in the truth table by finding the input row where both A and B are equal to 1:
NOT AND
Input Output Input A Input B Output
0 1 0 0 0
1 1 0 1 0
1 0 0
1 1 1
The way that results are determined by matching the proper rows and columns in the truth tables, a purely automatic procedure, has a crucial resemblance to the "instruction tables" Turing proposed.
One of the important features of Boolean algebra is the way logical operations can be put together to form new ones, and collections of logical operations can be put together to perform arithmetic operations. Logical syllogisms can be constructed in terms of operations on zeroes and ones, by arranging for the output of one truth table to feed input to another truth table. For example, it turns out that by putting a not before every and input, and putting another not after its output, it is possible to build an "or" operation. By stringing various sequences of only these two basic operations, "not" and "and," it is possible to build procedures for adding, subtracting, multiplying, and dividing. Logic and arithmetic are thus intimately and simply related. What nobody knew until Shannon told us was that the same algebra could describe the behavior of electrically witched circuits.
Equally important was the way these combinations of logical and arithmetic operations could be used to build a "memory" operation. Boolean algebra makes it possible to devise a procedure, or build a device, the "state" of which can store specific information--either data or operations. If electrical circuitry can perform logical and mathematical operations, and can also store the result of those operations, then electronic digital computers can be designed.
Until Shannon, Boolean algebra had been a curious and almost totally forgotten eddy in the mainstream of mathematical thought for almost a century, and was certainly unknown to the more practical-minded world of physics and electrical engineering. And that is where the genius of Shannon's rediscovery lies, for he was writing a thesis in electrical engineering, not mathematical logic, and the objects of his concern were not the processes of thought but the behavior of large circuits of electrical switches connected together into the kinds of circuits one finds in a telephone system.
Shannon was interested in the properties of complicated electrical circuits that were built from very simple devices known as relays. A relay is a switch--a device that opens or closes a circuit, permitting or blocking the flow of electrical --not unlike an ordinary light switch, except a relay is not switched on or off by a human hand, but by the passage of an electrical current.
A relay contains an electromagnet. When a small current flows into the relay, the electromagnet is activated, closing the circuit controlled by the relay until the input current is turned off. In other words, the electromagnet is a small electrical circuit that opens and closes another electrical circuit. The circuit of one relay can also control the electromagnet of the next relay, and so on, until you have a complete circuit that is made of nothing but switches, all controlling one another, depending on how they are set at the beginning and how they are altered by new input.
Each relay and circuit controlled by that relay can be in only one of two states, on or off. This two-state characteristic of switched circuits is what links electricity to logic, for each relay-controlled circuit can be seen as a truth table, where current flows from the output only when specified input conditions are satisfied, and logical operations can be seen as physical devices that emit an output pulse if and only if all of their input switches are on, or off, or some specified combination,
In the 1930s, telephone systems were using ever larger and more complicated mazes of circuits controlled by these relays. Instead of requiring a human operator to plug the proper jack into the right part of a switchboard, relays could close the circuit when the specified input conditions were reached. Using relays, all kinds of useful things could be done in the way of automatic dialing and routing. But the growing complexity of the circuitry was getting to be a problem. It was becoming harder and harder to figure out what these big collections of switches were doing.
Shannon was looking for a mathematical procedure that was best suited for describing the behavior of relay circuits. His thesis showed how George Boole's algebra could be used to describe the operations of these complex circuits. And he was not unaware of the implications the fact that these circuits could now be designed to represent the operations of logic and arithmetic. [1]
If logic was the formal system that most closely matched the operations of human reason, and if Boole's truth tables could embody such a formal system of simulated reasoning, then the use of truth tables as the "instruction tables" Turing discussed, and with switching devices like relays to represent the "states" of the machines (or the cells of the tape), it would be possible to build electrical circuits that could simulate some of the logical operations of human thought.
When the digital computer builders got together to plan the future development of the technology, Shannon was in the thick of it--and he didn't hesitate to remind his colleagues that what they were building was the first step toward artificial intelligence. But during the ten years immediately following his first breakthrough, Shannon turned to a different aspect of this new field. His new employer was bell Laboratories, and the electrical or electronic communication of messages was his specialty. AT&T, the foremost communication company in the world, was the owner of Bell Laboratories, so naturally the laboratory was interested in supporting Shannon's probes into the fundamental nature of communication. Shannon was encouraged to pursue his interesting questions such as: When something is communicated, what is delivered from one party to another? When a communication is obscured by noise or encryption, what fails to get across?
This was the communication part of the communication and control problem pointed out by Wiener. During the war, working at top-secret defense projects for Bell Laboratories, Shannon was involved in cryptological work that brought him into contact with Turing. After the war, Shannon concentrated on describing the nature of the entity they were communicating and manipulating with all these logical and mathematical circuits.
At this point, nobody knew, exactly, what information was. Just as he had found the perfect tool for describing relay circuits, after the war Shannon wanted to find mathematical tools for precisely defining the invisible but powerful commodity that these new machines were processing. He succeeded in finding the descriptive tools he sought, not in an obscure corner of mathematics, as in the case of Boole's algebra, but in the fundamental laws governing energy.
Like Turing, Shannon put a surprise finishing touch on a project that scientists had worked at for centuries. In this case, the quest was not to understand the nature of symbol systems, but a more pragmatic concern with the nature of energy and its relation to information. Although Shannon was specifically looking at the laws underlying the communication of messages in man-made systems, and generally interested in the difference between messages and noise, he ended up dealing with the laws governing the flow of energy in the universe. In particular, he discovered the secrets of decoding telephone switching networks, hidden in the work of previous scientists who had discovered certain laws governing heat energy in steam engines.
Back when the Industrial Revolution was getting started, and steam-powered engines were the rage, it became a practical necessity to find out something about the efficiency of these energy-converting devices. In the process, it was discovered that something fundamental to the nature of heat prevents any machine from ever becoming perfectly efficient. The study of the movement of heat in steam engines became the science of thermodynamics, given precise expression in 1850 by Rudolf Clausius, in his two laws of thermodynamics.
The first law of thermodynamics stated that the energy in a closed system is constant. That means that energy cannot be created or destroyed in such systems, but can only be transformed. The second law states, in effect, that part of that unchangeable reservoir of energy becomes a little less stable every time a transformation takes place. When you pour hot water into cold water, you can't separate it back into a hot and a cold glass of water again (without using a lot more energy). Entropy, meaning "transformation," was the word Clausius later proposed for that lost quantity of usable energy.
Entropy as defined by Clausius is not just something that happens to steam engines or to glasses of water. It is a universal tendency that is as true for the energy transactions of the stars in the sky as it is for the tea kettle on the stove. Because the universe is presumed to be a closed system, and since Clausius demonstrated that the entropy of such systems tends to increase with the passage of time, the gloomy prediction of a distant but inevitable "heat death of the universe" was a disturbing implication of the second law of thermodynamics. "Heat death" was what they called it because heat is the most entropic form of energy.
But the gloomy news about the end of time wasn't the only implication of the entropy concept. When it was discovered that heat is a measure of the average motion of a population of molecules, the notion of entropy became linked to the measure of order or disorder in a system. If this linkage of such disparate ideas as "heat," "average motion," and "order of a system" sounds confusing, you have a good idea of how nineteenth-century physicists felt. For a long time, they thought that heat was some kind of invisible fluid that was transferred from one object to another. When it was discovered that heat is way of characterizing a substance in which the molecules were, on the average, moving around faster than the molecules in a "cold" substance, a new way of looking at systems consisting of large numbers of parts (molecules, in this case) came into being. And this new way of looking at the way the parts of systems are arranged led, eventually, to the entropy-information connection.
Because "average motion" of molecules is a statistical measure, saying something about the amount of heat in a system says something about they way the parts of that system are arranged. Think about a container of gas. The system in this case includes everything inside the container and everything outside the container. The gas is considered to be hot if the average energy of the molecules inside the container is higher than the average energy of the molecules outside the container. Some of the molecules inside the container might, in fact, be less energetic (cooler) than some of the molecules outside the container--but on the average, the population of molecules inside are more energetic than the population of the molecules outside.
There is a certain order to this arrangement--energetic molecules are more likely to be found inside the container, less energetic molecules are more likely to be found outside. If there were no container, the highly energetic molecules and the less energetic molecules would mix, and there would be no sharp differentiation between the hot parts and the cold parts of the system.
A system with high entropy has a low degree of order. A system with low entropy has a higher degree of order. In a steam engine, you have the heat in one place (the boiler) and it is dissipated into the cold part (the condenser). This is a very orderly (low entropy) system in the sense that anyone can reliably predict in which part of the engine the hot molecules are likely to be found. But when all the parts of a steam engine are the same temperature, and the hot and cold molecules are equally likely to be found in the boiler and the condenser (and hence the entropy is high), the engine can't do any work.
Another physicist, Boltzmann, showed that entropy is a function of the way the parts of the system are arranged, compared with the number of ways the system can be arranged. For the moment, let's forget about molecules and think about decks of cards. There is a large number of ways that fifty-two cards can be arranged. When they come from the factory, every deck of cards is arranged in a definite order, by suit and by value. With a little bit of thought, anybody can predict which card is the fifth from the top of the deck. The predictability and orderliness disappears when the deck is shuffled.
An unshuffled deck of cards has a lower degree of entropy because energy went into arranging it in an unlikely manner. Less energy is then required to put the deck into a more probable, less orderly, less predictable, more highly entropic state: According to the second law of thermodynamics, all decks of cards in the universe will eventually be shuffled, just as all molecules will have an equal amount of energy.
James Clerk Maxwell, yet another nineteenth-century scientist, proposed a paradox concerning this elusive quality called entropy, which seems to relate such intuitively dissimilar measures as energy, information, order, and predictability. The paradox became infamous among physicists under the name "Maxwell's demon." Consider a container split by a barrier with an opening small enough to pass only one molecule at a time from one side to another. On one side is a volume of hot gas, in which the average energy of the molecules is higher than the average energy of the molecules in the cold side of the container. According to the second law, the hotter, more active molecules should eventually migrate to the other side of the container, losing energy in collisions with slower moving molecules, until both sides reach the same temperature.
What would happen, Maxwell asked, if you could place a tiny imp at the molecular gate, a demon who didn't contribute energy to the system, but who could open and close the gate between the two sides of the container? Now what if the imp decides to let only the occasional slow-moving, colder molecule pass from the hot to the cold side when it randomly approaches the gate? Taken far enough, this policy could mean that the hot side would get hotter and the cold side would get colder, and entropy would decrease instead of increase without any energy being added to the system!
In 1922, a Hungarian student of physics by the name of Leo Szilard (later to be von Neumann's colleague in the Manhattan project), then in Berlin, finally solved the paradox of Maxwell's demon by demonstrating that the demon does indeed need to contribute energy to the system, but like a good magician the demon does not expend that energy in its most visible activity--moving the gate--but in what it knows about the system. The demon is a part of the system, and it has to do some work in order to differentiate the hot and cold molecules at the proper time to open the gate. Simply by obtaining the information about molecules that it needs to know to operate the gate, the demon adds more entropy to the system than it subtracts.
Although Szilard showed implicitly that information and entropy were intimately connected, the explicit details of the relationship between these two qualities, expressed in the form of equations, and the generalization of that relationship to such diverse phenomena as electrical circuits and genetic codes, were not yet known. It was Claude Shannon who made information into a technical term, and that technical term has since changed the popular meaning of the word.
Another puzzle related to entropy, and the cryptic partial solution to it proposed in 1945 by another physicist, was a second clue linking it to information. Quite simply: If the universe tends toward entropy, how does life, a highly ordered, energy-consuming, antientropic phenomenon, continue to exist? In a universe flowing toward disorder, how on earth did one-celled creatures complicate themselves enough to build a human nervous system?
Quantum physicist Erwin Schrödinger pointed out that life defies the cosmic energy tide courtesy of our sun. As long as the sun keeps shining, the earth is not a closed system. Photochemical reactions on earth capture a tiny fraction of the sun's radiant energy and use it to complicate things. In his famous "What Is Life?" lecture in 1945, Schrödinger remarked that "living organisms eat negative energy." The relationship between negative energy and information, like Boole's obscure algebra, was just waiting to be found when Shannon started to wonder how messages manage to maintain their order in a medium where disorder is often high.
The matter of devising a simple code and reliably transmitting it from place to place was very important to British cryptographers, and Shannon had done his own work in cryptography. The prediction of the behavior of electrical circuits used to transmit messages made of these codes was another of Shannon's interests. When he put it all together with a formal examination of how messages can be distinguished from noise, and found that the very equation he sought was a variation of the defining equation for entropy, Claude Shannon happened upon the fact that the universe plays twenty questions with itself.
The formal foundations of information theory were laid down in two papers in 1948, and at their core were fundamental equations that had a definite relationship to Boltzmann's equations relating entropy to the degree of order in a system. But the general idea behind the equations was simple enough for Shannon to suggest a game as a way of understanding the quantitative dimension of coding and communication.
The game is a mundane version of "twenty questions." In the case of the English alphabet, it turns out to be a game of "five questions." Player number one thinks of a letter of the alphabet. Player number two tries to guess the letter, using only questions like "is it earlier than L in the alphabetical sequence?" It is a strictly yes-or-no game, in which only one of two possible answers applies at every move.
Shannon pointed out that it takes a maximum of five questions to locate any of the thirty symbols necessary for making English sentences. If the sequence of yes or no decisions needed to specify the correct letter is converted into a sequence of zeroes and ones or a sequence of on and off impulses, or any other kind of binary symbol, you have a code for communicating the alphabet--which is, in fact, the basis of the code used for transmitting teletypewriter messages.
This game can be visualized as a tree structure, where each letter is the only leaf on a branch that branches off a branch that eventually branches off a trunk. Or it can be seen as a garden of forking paths, where each path is a sequence of one-way-or-the-other decisions, and the location of any endpoint can be coded by specifying the sequence of decisions along the path. It is also a good way to locate an address in a computer's memory or to encode an instruction to be placed in that location. This basic element in this game-tree-code, the binary decision, was the basis for Shannon's basic measure of information--the bit. Whenever computer enthusiasts speak of a "bit," they are referring to one of those decisions in the garden of forking paths.
Note that each decision, each bit, reduces the uncertainty of the situation, whether you are designating turns in a pathway or numbers in a guessing game or the energy state of molecules in a container. But what if you were to use a different strategy to guess the right answer? What if you just named each of the possible letters, one at a time, in a sequence or randomly? This relates to probability theory, the mathematical principles governing the random selection of small samples from large populations.
The relative probability of an event occurring, whether it is the probability of a molecule being hot or the probability of a symbol being a specific letter of the alphabet, depends upon the total number of cases in the population and the frequency of the specified event. If there are only two cases in the population, a single yes or no decision reduces the uncertainty to zero. In a group of four, it takes two decisions to be sure. In a group of trillions, you have to guess a little. When you are making predictions about such large populations, averages based on the overall behavior of the population have to replace precise case-by-case calculations based on the behavior of individual members of the population.
One of the properties of a statistical average is that it is quite possible for a population to be characterized by an average value that is not held by any particular element of the population. If you have a population consisting of three people, and you know that one is three feet tall, one five feet tall, and one is six feet tall, you have quite precise information about that population, which would enable you to pick out individuals by height. But if all you know is that the average height of the population is four feet, eight inches, you wouldn't know anything useful about any one of the three particular individuals. Whenever a system is represented by an average, some information is necessarily lost, just as two energy states lose a little energy when they are brought into equilibrium.
Whenever you move from an average measure to a precise measure, you have reduced uncertainty about that population. And that reduction in uncertainty is where the statistical properties that govern the motions of populations of molecules are connected to the statistical properties of a binary code, where entropy meets information. To see how uncertainty can relate to a binary code, think about a game of twenty questions. If the object of the game is to guess a number between one and one hundred, and player one asks if the number is larger than fifty, an answer from player two (no matter if it is yes or no) reduces player one's uncertainty by one half. Before asking the question, player one had one hundred possible choices. After asking that single yes or no question, player one either knows that the number is greater than fifty or that it is less than fifty.
One of the things Shannon demonstrated in 1948 was that the entropy of a system is represented by the logarithm of possible combinations of states in that system--which is the same as the number of yes-or-no questions that have to be asked to locate one individual case. Entropy, as it was redefined by Shannon, is the same as the number of binary decisions necessary to identify a specific sequence of symbols. Taken together, those binary decisions, like the answers in the game, constitute a definite amount of information about the system.
When it comes to arranging molecules, living organisms seem to have a great deal of information about how to take elementary substances and turn them into complex compounds. Somehow, living cells manage to take the hodgepodge of molecules found in their environment and arrange them into the substances necessary for sustaining life of the organism. From a disorderly environment, living creatures somehow create their own internal order. This remarkable property now sounds suspiciously like Maxwell's demon. The answer, as we now know, is to be found in the way the DNA molecule arranges its elements--doing so in such a way that the processes necessary for metabolism and reproduction are encoded. The "negative entropy" that Schrödinger says is the nourishment of all life is information, and Shannon showed exactly how such coding can be done--in molecules, messages, or switching networks.
It has to be said, by the way, that Shannon was reluctant to use the word "entropy" to represent this measure implied by his equations, but von Neumann told him to go ahead and use it anyway, because "since nobody knowswhat entropy is, in a debate you will be sure to have an advantage."
Remember that entropy is where Shannon ended up, not where he started. Hot molecules and DNA were far from his original intention. He got to the guessing game and the notion of bits and the relationship between uncertainty and entropy because he looked closely at what a message really is. How does a signal that conveys information differ from everything else that happens? How much energy must be put into broadcasting a voice over the radio to be sure that it will be understood despite atmospheric interference or static from other sources? These were the questions that Shannon set out to answer.
Shannon's 1948 publication ("A Mathematical Theory of Information") presented a set of theorems that were directly related to the economical and efficient transmission of messages on noisy media, and indirectly but still fundamentally related to the connection between energy and information.[2] Shannon's work was a direct answer to an engineering problem that had not decreased in importance since the war: how can messages be coded so that they will be reliably transmitted and received over a medium where a certain amount of noise is going to garble reception?
Shannon showed that any message can be transmitted with as high a reliability as one wishes, by devising the right code. The limit imposed by nature is concerned only with the limit of the communication channel. As ling as there is a channel, no matter how noisy, a code can be devised to transmit any message with any degree of certainty. Entropy is a measure of the relationship between the complexity of the code and the degree of certainty. These theorems meant a lot to radio and telephone engineers, and made color television as well as broadcasts from the moon possible, but Shannon stated them in a way that demonstrated their universality beyond the domain of electrical engineering.
The key to life itself, in fact, turned out to be a matter of information, as the world learned five years later, when that young physicist-turned-biologist who had attended Schrödinger's lecture, Francis Crick, teamed up with James Watson to decipher the molecular genetic coding of the DNA helix. Scientifically, and on the level of consciousness, people seemed to jump rather too quickly to make the transition from an energy-based metaphor of the universe to an information model. The rush to generalize information theory to all sorts of scientific areas, some of them of dubious scientific merit, led Shannon to decry this "bandwagon effect," remarking that information theory "has perhaps ballooned to an importance beyond its actual accomplishments. . . . Seldom do more than a few of nature's secrets give way one at a time."[3]
Despite Shannon's disclaimer, information- and communication-based models have proved to be enormously useful in the sciences because so many important phenomena can be seen in terms of messages. Human bodies can be better understood as complex communication networks than as clockwork-like machines. The error-correcting codes guaranteed by Shannon's "noisy channel" theorem are just as useful for genetic control of protein synthesis as for protocols in a computer network. Shannon's MIT colleague, Noam Chomsky, has used a similar tool in his exploration of the "deep structure" of language.[4]
With all these higher-level abstractions, Shannon did not abandon all thought of the potential of digital computers. Where Wiener saw the computer as a self controlling mechanism and von Neumann saw a device with logical as well as mathematical properties, Shannon tended to think of ENIAC and UNIVAC as information processing machines.
Like Turing and other mathematicians since then, Shannon was fascinated with the idea that something as sophisticated and essentially human as chess playing could, in theory, be emulated by some future version of these devices. In February, 1950, Shannon published "A Chess Playing Machine" in The Scientific American. Half a decade before anyone dared to name the endeavor "artificial intelligence research," Shannon pointed out what a very few people then recognized--that electronic digital computers could "be adapted to work symbolically with elements representing words, propositions or other conceptual entities."
A chess game is a Turing machine. And a universal Turing machine, given the properly coded rules, ought to be able to play chess. Shannon pointed out that the way most people would design a machine to play chess--to mechanically examine each alternative move and evaluate it, the so-called brute-force method--would be virtually impossible, even on the fastest imaginable computer. He estimated that a typical chess game has about 10^120 possible moves, so "A machine calculating one variation each millionth of a second would require over 10^95 years to decide on its first move!"
This "combinatorial explosion"--the rapid and overwhelming buildup of alternatives in any system in which each level leads to two or more deeper levels--was another one of those secrets of nature that Claude Shannon was in the habit of turning up. The explosive expansion of the number of alternative decisions is a barrier that confronts any attempt to exhaustively examine a branching structure, and continues to confront programmers who seek to emulate cognitive functions by performing searches through problem spaces.
Turing and Shannon were altogether serious in their interest in chess, because of the complexity of the game in relation to the simplicity of its rules, and because they suspected that the shortcut needed to perform this kind of time-consuming search-procedure would also be a clue to the way brains solved all sorts of problems.
A chess playing program was also interesting because it was a relative of the kind of informational entities known as automata that von Neumann and Turing had been toying with. Once again, like Turing's universal machines, these automata were theoretical devices that did not exist at that time, but were possible to build, in principle. For years, Shannon experimented with almost absurdly simple homemade versions--mechanical mice that were able to navigate simple mazes.
In 1953, Shannon wrote a paper, "Computers and Automata," in which he posed questions that continue to be of acute interest to psychologists as well as computerists.[5] Can a chess playing computer learn form its mistakes? Is it possible to build a machine that can diagnose itself and repair its own malfunctions? Can computer programs ("virtual machines") be created that enable computers to write their own software to the specifications of the human user? Can the way human brains process information (known in some hard-core AI circles as "wetware") ever be effectively simulated by hardware and software?
In the summer of 1953, while he was working on these ideas, Shannon hired two temporary laboratory assistants named Minsky and McCarthy, another pair of prodigies who knew some fancy mathematics and thought they could do big things with computers. Here were the first members of the first native generation of computer scientists, the ones who already knew about electronics and cybernetics and information theory and brain physiology and were looking for something ambitious to do with it all. They ended up in the right place when they dug up Shannon in the midst of Bell Laboratories.
Shannon had long spoken of his suspicion that the future evolution of more sophisticated computer hardware would make it possible to construct software capable of simulating some parts of human cognition. But these younger guys were blatant believers. They were out to build an intelligence, and didn't mind saying so. McCarthy and Shannon edited a book on automata, and three years later, in 1956, Shannon joined Minsky, McCarthy, and an IBM computer researcher, Nathaniel Rochester, in sponsoring a summer conference at Dartmouth University, to set goals for this new field. The new field they gathered to discuss was a branch of science that did not yet have a name, but which was founded on the assumption that the existence of computers now made it possible to consider creating an artificial version of the most complex system known to science--human intelligence.
It was around 1956 that McCarthy started using the words "artificial intelligence." The Dartmouth Conference was the constitutional convention of the artificial intelligence faction, and it was also the place where two virtually unknown Rand programmers named Alan Newell and Herbert Simon breezed in from Santa Monica with a piece of software they wrote with Cliff Shaw. To everyone's astonishment, it was a program--the famous Logic Theorist that could prove theorems from Russell and Whitehead's Principia Mathematica--that actually did what the rest of them thought they were there to plan to do.
Hopes were high for the AI rebels in 1956 and 1957. Major efforts were under way and ambitious goals were in sight. A very few unorthodox thinkers staked their careers on the conviction that this branch of computer science, formerly a branch of science fiction, would soon be seen as more important than anything else humankind had ever attempted: Minsky remained at MIT and concentrated on the problem of how knowledge is represented in minds and machines; Newell and Simon (now a Nobel Laureate) began their long association with one another and Carnagie-Mellon University, where they concentrated on the information processing approach to psychology and AI design; McCarthy created LISP, a language specifically for conducting AI research, and left MIT to preside over Stanford's AI laboratory.
Claude Shannon went back to his chess playing machines and continued building the mechanical mice that could learn how to run simple mazes. In 1956, Robert Fano, the electrical engineering student who witnessed Norbert Wiener's "Entropy is information!" exclamations back in the summer of 1947, brought Shannon to MIT from Bell Laboratories.
His professional standing was so far beyond reproach that his occasional unicycle excursions through MIT halls, and his reluctance to lecture or publish frequently, hardly dented Shannon's reputation. In fact, his reputation had reached such mythological proportions that he had to start writing disclaimers. Fame wasn't something he wanted or needed. By 1960, he didn't even come to the office.
In the 1960s Shannon became interested in the stock market as a real-world experiment in probability theory, and rumor has it that he didn't do too badly. He began to seriously extend his analysis of communications and messages to the English language. Nobody but Shannon knows the full extent of his discoveries. Robert Fano (who went on to become the administrative director of Project MAC) recently said this of Shannon:[6]
There is a significant body of work he did in the 1950s that has never been printed. He doesn't want someone else to write his papers for him, and he won't write them himself. It's as simple and as complicated as that. He doesn't like to teach. He doesn't like giving lectures. His lectures are jewels, all of them. They sound spontaneous, but in reality they are very, very carefully prepared.
In the early sixties, one of the extremely few students Shannon personally took on, another MIT bred prodigy by the name of Ivan Sutherland, made quite a splash on the computer science scene. By the mid-1970s, Shannon, now in his sixties, had become a literal gray eminence. By the early 1980s, he still hadn't stopped thinking about things, and considering his track record, it isn't too farfetched to speculate that his most significant discoveries have yet to be published.
In the late 1950s, around the time Shannon began to retreat from public life, the artificial intelligence pioneers began to stake out ambitious territories for their laboratories--goals like automatic theorem-proving programs, or knowledge-representation languages, or robotics--and it began to be possible to dream of computers that could be used as laboratories for running experiments in new kinds of AI programs. Then fate put a little pressure on the story once again.
This time, it was not a war, but an implicit threat of war. The space race and the computer revolution were ready to be launched by 1957, and the information processing devices pioneered by the World War II creators of computing were ready to leave the laboratories and begin to infiltrate the real world. As usual, things started popping when an MIT professor stumbled onto something big.