Although humans contain a thousand times more DNA than do bacteria, the best estimates are that humans have only about 20 times more genes than do the bacteria. This means that the vast majority of eukaryotic DNA is apparently nonfunctional. This seems like a contradiction. Why wouldn't more complicated organisms have more DNA? However, the DNA content of an organism doesn't correlate well with the complexity of an organism—the most DNA per cell occurs in a fly species. Other arguments suggest that a maximal number of genes in an organism may exist because too many genes means too many opportunities for mutations. Current estimates say that humans have about 100,000 separate mRNAs, which means about 100,000 expressed genes. This number is still lower than the capacity of the unique DNA fraction in an organism. These arguments lead to the conclusion that the vast majority of cellular DNA isn't functional.
Genes that are expressed usually have introns that interrupt the coding sequences. A typical eukaryotic gene, therefore, consists of a set of sequences that appear in mature mRNA (called exons) interrupted by introns. The regions between genes are likewise not expressed, but may help with chromatin assembly, contain promoters, and so forth. See Figure 1.
Intron sequences contain some common features. Most introns begin with the sequence GT (GU in RNA) and end with the sequence AG. Otherwise, very little similarity exists among them. Intron sequences may be large relative to coding sequences; in some genes, over 90 percent of the sequence between the 5′ and 3′ ends of the mRNA is introns. RNA polymerase transcribes intron sequences. This means that eukaryotic mRNA precursors must be processed to remove introns as well as to add the caps at the 5′ end and polyadenylic acid (poly A) sequences at the 3′ end.
Eukaryotic genes may be clustered (for example, genes for a metabolic pathway may occur on the same region of a chromosome) but are independently controlled. Operons or polycistronic mRNAs do not exist in eukaryotes. This contrasts with prokaryotic genes, where a single control gene often acts on a whole cluster (for example, lacI controls the synthesis of β‐galactosidase, permease, and acetylase).
One well‐studied example of a clustered gene system is the mammalian globin genes. Globins are the protein components of hemoglobin. In mammals, specialized globins exist that are expressed in embryonic or fetal circulation. These have a higher oxygen affinity than adult hemoglobins and thus serve to “capture” oxygen at the placenta, moving it from the maternal circulation to that of the developing embryo or fetus. After birth, the familiar mature hemoglobin (which consists of two alpha and two beta subunits) replaces these globins. Two globin clusters exist in humans: the alpha cluster on chromosome 16, and the beta cluster on chromosome 11, as shown in Figure 2.
These clusters, and the gene for the related protein myoglobin, probably arose by duplication of a primoridial gene that encoded a single heme‐containing, oxygen‐binding protein. Within each cluster is a gene designated with the Greek letter Ψ. These are pseudogenes—DNA sequences related to a functional gene but containing one or more mutations so that it isn't expressed.
The information problem of eukaryotic gene expression therefore consists of several components: gene recognition, gene transcription, and mRNA processing. These problems have been approached biochemically by analyzing the enzyme systems involved in each step.