The naturally occurring amino acids have a common structure. Amino acids, as the name implies, have two functional groups, an amino group (–NH 2) and a carboxyl group (–COOH). These groups are joined to a single (aliphatic) carbon. In organic chemistry, the carbon directly attached to a carboxyl group is the alpha (α) position, so the amino acids in proteins are all alpha‐amino acids. The side chains that distinguish one amino acid from another are attached to the alpha carbon, so the structures are often written as shown in Figure , where R stands for one of the 20 side chains:
The amino acids found in proteins have a common stereochemistry. In the structure illustrated in Figure , the amino group is always to the left side of the alpha carbon. In organic chemistry, this stereochemistry is referred to as L (for levo, meaning left). Thus, the amino acids found in proteins are L‐alpha amino acids. (Biochemists, being creatures of habit, usually do not refer to amino acid stereochemistry in the R and S nomenclature.) A few D‐amino acids are found in nature, although not in cellular proteins. (The D comes from dextro, meaning right.) For example, some peptide antibiotics, such as bacitracin, contain D‐amino acids.
The carboxyll and amino groups of the amino acids can respectively donate a proton to and accept a proton from water. This exchange happens simultaneously in solution so that the amino acids form doubly ionized species, termed zwitterions (from German zwei, meaning two) in solution. The formation of zwitterions can be rationalized from the principles of acid‐base chemistry. The strongest acid that can exist in water is the conjugate acid of water, the hydronium ion, H 3O +. Carboxyllic acids are stronger acids than water, so the carboxyl group of an amino acid (pK a near 2) will donate a proton to water. Similarly, α‐amino groups (pK a greater than 9) are stronger bases than water and will accept a proton from water. Amino acids in water, therefore, have the general structure:
The side chains of amino acids give them their different chemical properties and allow proteins to have so many different structures. How many proteins are possible? Protein chains generally vary in size from 100 to 1,000 amino acids in length. Even if limited to the smallest chain length, there would be 20 100, over 10 130—that is, 1 with 130 zeroes after it—possible primary structures. (Again, remember that the number of elementary particles in the universe is estimated to be 10 80.) Obviously, not all these potential proteins exist in nature. Instead, the primary structures of proteins are related to each other, and almost all proteins have homologues, that is, other proteins sharing a common ancestor.
What homologues are possible? In general, homologous proteins share some short amino acid sequences exactly. In other cases, the differences result in the substitution of one amino acid side chain by another chemically similar one. Six classes of amino acid side chains exist; within a group, the amino acid side chains are chemically similar. Substitution of one amino acid side chain for another one within the same group is known as conservative substitution . Homologous proteins are related by conservative amino acid substitutions, as in Figure . Although nonconservative substitutions are tolerated at some positions in the primary sequence of a protein, the general rule illustrated in Figure is followed when evaluating the relationship of two protein primary sequences. (The dashes indicate that all three proteins have the same amino acid at that position—these are highly homologous proteins, indeed!)
Aliphatic amino acids. The side chains of glycine, alanine, valine, leucine, and isoleucine, shown in Figure , contain saturated carbon‐carbon and carbon‐hydrogen bonds only. Thinking of glycine as containing a side chain can be somewhat confusing because the fourth substituent on the α‐carbon is only a single hydrogen atom. Alone among the 20 amino acids, glycine is not optically active; the D‐ and L‐ nomenclature is irrelevant. Alanine has a methyl group for its side chain, valine a 3‐carbon side chain, while leucine and isoleucine have 4‐carbon side chains.
Aromatic amino acids. Phenylalanine, tyrosine, and tryptophan contain ring systems. In order of increasing complexity, phenylalanine has a benzyl group, while tyrosine is phenylalanine with an added hydroxyl group in the trans position relative to the methyl group. Tryptophan has two rings, one of which contains a nitrogen atom. The nitrogen is not ionizable at biologically relevant pH values.
Ionizable basic amino acids. Histidine, lysine, and arginine each have a nitrogen atom which, unlike the nitrogen of tryptophan, is ionized at the pH ranges found in the cell. Histidine has a 5‐member imidazole ring. One of the two nitrogen ions has a pK a near 7.0. This means that, at the neutral pH values found in cells, about half of the histidine molecules will have their side chains protonated (that is, with a positive charge) and about half will have their side chains unprotonated and uncharged. Histidine is often used in enzymes to bind and release protons during the enzymatic reaction.
Lysine and arginine are almost fully ionized at the pH values found in the cell. Lysine's pK a is greater than 9; therefore, it will be > 99% protonated in the cell. Arginine's side chain is even more basic; its pK a is > 12. Therefore, these amino acids have a net positive charge in the cell.
Carboxyllate‐containing amino acids. Aspartic acid and asparagine have four carbons; glutamic acid and glutamine have five carbons in all. Aspartic acid has a carboxyllic acid, and aspargine has an amide side chain. Similarly, glutamic acid has a carboxyllic acid side group, and glutamine has an amide group. The pK a’s of the side chain carboxyll groups in aspartate and glutamate are near 4.0. Therefore, these side chain groups are almost fully ionized in the neutral conditions found in cells and are negatively charged.
Serine and cysteine can be thought of as being related to alanine. Serine is alanine with a hydroxyl (–OH) group and cysteine is alanine with a sulfhydryl (–SH) group.
Threonine has four carbons, with a hydroxyl group on the beta carbon. The beta carbon is next to the one containing the alpha carbon (the alpha carbon has the amino group on it). The presence of the hydroxyl group on threonine means that the beta carbon of threonine is optically active, in addition to the alpha carbon. As the name suggests, the –OH group has the D configuration, or threo to the alpha carbon. (The other possible stereochemistry is erythro—think of the letter E to remember this term. The arms of the E point in the same direction.)
Methionine has a methyl group on its sulfur. The backbone of methionine has one more carbon than does cysteine. (Cysteine with an extra carbon is termed homocysteine; homocysteine is an intermediate in the biosynthesis of methionine.)
Proline is the odd one out among the amino acids. It has four carbons, with the alpha amino group bonded not only to the alpha carbon but also to the last side chain carbon. The cyclic side chain means that proline is conformationally rigid. That is, the carbon‐carbon bonds of proline do not rotate in solution. Other amino acids are more flexible in solution.
Peptide bond. The peptide bond joins the carboxyl and amino groups of amino acids. When activated, carboxyllic acids and amines form amides. Amino acids are bifunctional, with each amino acid having both amino and carboxyl groups. Peptides are composed of amino acids joined head to tail with amide bonds. Peptides are classified according to their chain length. Oligopeptides are shorter than polypeptides, although no defined transition exists between the two forms. The joining of amino acids to form a peptide bond occurs formally (although the mechanism of its formation is more complicated) in the following way:
Note that two amino acids can form a dipeptide (a peptide composed of two units) in either of two ways. For example, glycine and alanine can form glycylalanine (gly‐ala) or alanylglycine (ala ‐gly):
No matter which arrangement occurs, each dipeptide will have one free amino group and one free carboxyl group. Peptide sequences are written in the direction from the amino to the carboxyl end.
Peptide bond structure. The peptide bond structure favors coplanar N, C, and O atoms. Although a peptide bond is formally a carbon‐nitrogen single bond, the unpaired electrons on the carboxyl oxygen and on the nitrogen can overlap through their pi orbitals to make the three‐atom system partially double‐bonded in character. The partially double‐bonded system makes it harder to rotate the peptide bond in solution. As a result, peptide bonds can exist in one of two conformational isomers, with the two carbons either cis or trans to each other.
Usually, the trans conformation is favored. Proline is an exceptional case because its peptide bond does not have a hydrogen, making the cis and trans isomers harder to change one for the other:
Proline usually is found in the trans isomer, although conversion (isomerization) between the cis and trans forms can be catalyzed by specific enzymes.