Complexity in Biochemical Genetics

At first glance, the subject of biochemical genetics can seem incomprehensibly complicated. How can a cell's genes possibly contain all the information about its capabilities for metabolism, macromolecular interactions, and responses to stimuli?

This question was answered, incorrectly, in the 1930s when biochemists concluded that the protein components of chromosomes had to carry genetic information. Scientists considered the DNA in chromosomes to be too simple a structure to be anything other than a scaffold. But in the 1940s, experiments carried out by Avery, Macleod, and McCarty showed that this view was wrong. Their experiments with bacteria showed that DNA carried the information for a heritable trait. This result forced a redefinition of the ideas about information in biology, and it was only when the Watson‐Crick structure was proposed for DNA that it was understood how a “simple” molecule could carry information from one generation to the next. Although there are only four subunits in DNA, information is carried by the linear sequence of the subunits of the long DNA chain, just as the sequence of letters defines the information in a word of text.

The possible information contained in a biomolecule is termed its complexity. In molecular biology and biochemistry, complexity is defined as the number of different sequences in a population of macromolecules. Even a relatively small polymer has an enormous number of potential sequences. DNA, for example, is built from only four monomers: A, C, G, and T. If each of these monomers is linked with every other one, these 4 monomers now produce/contain 16 possible dimers (4 × 4) because each position can have an A, C, G, or T. There are 64 possible trimers, 4 × 4 × 4. So in any DNA chain the number of possible sequences is 4 N, where N is the chain length.

Even a relatively small DNA chain can carry a large amount of information. For example, the DNA of a small virus, 5,000 nucleotides long, can have 4 5,000 possible sequences. This is a huge number—approximately 1 with 3,010 zeroes after it. (By comparison, the number of elementary particles in the universe is estimated at 10 80, or 1 with 80 zeroes after it.) But the virus has only one DNA sequence, which means that only one of the huge number of possible sequences has been selected to encode the virus's biochemical functions. In other words, there is information in the DNA sequence. The virus carries a large amount of information in a small space.

This concept of information is similar to the memory of a computer, which is made up of small semiconductor switches, each of which has two positions—on and off. The ability of computers to do an ever‐increasing number of tasks depends on the ability of engineers to design chips that have more and more switches in a small space. Similarly, the ability of cells to do so many biochemical tasks depends on the large number of DNA nucleotides in the small space of the chromosomes.