Biological Computing -------------------- Biological computing is an extremely young field which attempts to extract computing power from the collective action of large numbers biological molecules. One can loosely think of a biological computer as a massively parallel machine where each processor consists of a single biological macromolecule. By employing extremely large numbers of such macromolecules in parallel, one can hope to solve computational problems more quickly than the fastest supercomputers. The most promising proposals so far for biological computers involve the very same molecule that encodes our own genetic identity, namely DNA. Indeed the first proposal for a biological computer was given by Leonard Adelman in 1994, when he showed how to use DNA to solve the NP-complete Directed Hamiltonian Path (DHP) problem in graph theory. In order to explain this seminal work, we first discuss relevant information about DNA from the field of molecular biology. A strand of DNA is a polymer sequence of exactly four types of nucleotides denoted by the letters A,C,G and T standing for adenine, cytosine, guanine and thymine respectively. Each nucelotide has its own complement, with which it readily bonds. Cytosine and Guanine are complementary as are Adenine and Thymine. Two strands are said to be complementary if each and every nucleotide in one strand complements the corresponding nucleotide in the other. Two complementary strands can of course bond, and the resulting double stranded DNA molecule takes on the shape of the famous double helix. A few well known techniques in molecular biology for manipulating DNA turn out to be crucial for the operation of most DNA computers. The first technique, DNA sequencing, enables biologists to read off the sequence of nucleotides in a piece of DNA. This can be thought of as reading the output of any given DNA computation. The fundamental computational step of a DNA computer can be thought of as a filtering operation in which a given collection of strands of DNA can be separated into two parts, one of which contains a prespecified subsequence and one that doesn't. The technique used to accomplish this separation is known as Biotin-avidin affinity purification. Additionaly, gel electrophoresis can be used to separate out DNA strands based on their length. Finally Polymerase Chain Reaction (PCR) can make copies of a strands of DNA, a step that is useful in reducing error on a DNA computer. Leonard Adelman opened up the whole field of biological computing by using the above well known facts and techniques in his solution of the DHP problem. In computer science, a graph consists of a set of nodes, some of which may be connected by edges directed from one node to another. A directed Hamiltonian path in a graph is a path that visits all of the nodes one by one without ever visiting a node more than once. The DHP problem is to ascertain for any given graph whether or not it has a Hamiltonian path. Adelman solved this problem by ingeniously couching it in the language of DNA. Each node is encoded by a DNA strand of 2N nucleotides. N merely has to be large enough so that there are enough strands to go around for each of the nodes. The first N nucleotides representing a node will be called the "arrival name" for the node. The second N nucleotides will be called the "departure name" for the node. This set of strands, each of length 2N representing the nodes of the graph are called the "node strands." A directed edge in a graph is given by specifying the beginning and ending nodes. To each such edge is associated another strand of DNA called the "edge strand." An edge strand is simply a sequence of 2N nucleotides, the first N of which are the departure name for the node the edge departs from, and the second N of which are the arrival name for the node the edge arrives at. Thus all the nodes and edges of the graph are represented by strands of length 2N, and we will see shortly that the use of these strands is enough to solve the DHP problem in a DNA computer. The first step in the computation is to create a large number of copies of the edge strands, and the complements of the node strands. These are all mixed in liquid solution. Now consider two edge strands where the arrival node of the first edge strand equals the departure node of the second edge strand. These two edges could possibly be traversed in order as part of a larger Hamiltonian path. Furthermore consider what happens when these two edge strands meet the complement of the node strand corresponding to the node common to both edges. By construction, this node strand complement has exactly the nucleotide sequence necessary to attach the two edge strands together. Chasing through the definitions, it is clear that the first N nucleotides of the complementary node strand complement the last N nucleotides of the first edge strand. Similarly, the last N nucleotides of the complementary node strand complement the first N nucleotides of the second edge strand. After all these bonds form, the complementary node strand acts like the glue that glues the two edge strands together. In this fashion, longer and longer paths form, with two compatible edge strands always glued together by the complement of the node strand common to both edges. All of the above reactions occur within seconds and the resulting DNA solution contains many strands of DNA all corresponding to different paths. Not all of these will be Hamiltonian paths however. The Hamiltonian ones can be isolated as follows. First of all gel electrophoresis can be used to select out only those strands that are of the correct length, i.e. strands that are of length 2N times the number of nodes in the graph. Only these can be proper candidates for a DHP. Second it must be checked that these select paths visit every node once. This can be done using Biotin-avidin purification applied to each complementary node strand. Every strand that does not contain a particular complementary node strand is thrown out. When this is done for all nodes in the graph, the only strands remaining are strands encoding DHP's. This specific example illustrates the general principle behind most biological computation paradigms in which the massively parallel action of a large number of molecules is used to solve hard problems. A single liter of water can dissolve about 1018 strands of DNA each of length on the ordre of 10,000 nucleotides. This represents and extraordinary amount of processing power and memory storage. Furthermore, since Adelman's proposal, other scientists have proposed methods to solve a few other combinatorial problems using DNA, most notably the satisfiability of boolean formulas shown by R. Lipton. Theoretical proposals have also been put forth to simulate cellular automata, to break the Data Encryption Standard, and even to simulate Turing machines, proving that DNA computation is universal. However, these advances are mainly theoretical, and efficient implementation and error control issues also continue to be leading areas of research focusing on making biological computing more and more practical.