THE GENETIC CODE M. W. Nirenberg Natiorial Heart Institute National Institutes of Health Bethesda, Maryland DRAFT OF THE MANUSCRIF'T 9/S/67 I. INTRODUCTION \ h recent years studies on the genetic code, protein synthesis, t and regulation Ii, f protein synthesis have expanded to such proportions that investigators in oi~~er~f~~~~~~~~~~d~e-*studen'ts often find it difficult attempt is made to formulate generkl apparent principals and the/logical design of the code. This chapter is meant to be a iscussion of the code, how it was / translated and the experimental data. In wr'$,t.ing the chapter an attempt has P $ &d I?& been made to formulate general rinci `als d data and to state the apparent logic which underlies the design of the cod . a. Coverage of topics \ \ has been selective rather than comprehensive8 / the chapter more of is/an essay than a review. The data which $%ow available on the code, the structure and function of nucleic acids and protein, and the process of protein synthesis -.; -2- `,, - "`-. . is so extensive that `j;i'.&&* j/J p,,! I h ,A. `;y (`. , .' , c+ v as4eemed essential from the outset tb be selectiv&?, i 6-- rather than comprehensiv . An attempt has bee ade in this chapter to survey ,the gemztic code, G 1 g -._ ?: _. <, to concentrate on fundamenta 6 o&qj .A& specially the apparent, design &f-the !, code-ixxwgrti@ and-the Coverage is selective rather I than comprehensive. I ::,i % B. Base Compositions of Codons Synthetic RNA preparations containing all possible base com- binations have been used as templates for hesis in vitro. In practice, the major factor the sensitivity of the assay is the G of endogenous mRHA in E. coli extracts. As shown in Fig. I , the level of endogenous mRNA is greatly reduced by incubatq s. coli extracts in the presence of DNase and all components required for protein synthesis until amino acid incorporation ceases. m t, 1 . . - . . 'am+=GF?&.l A-- . Protein synthesis then is almost completely dependent upon the addition of mRNA. Optimal conditions for in vitro protein synthesis stimulated -- by synthetic mRXA were determined ( ), and methods were devised for rapidly washing radioactive protein precipitates on cellulose nitrate filters ( ). Most radioactive products are washed with 10% tri- chloroacetic aciQ-)c hose rich in proline are washed with 20% trichloroacetic acid ( )Awhereas lysinewrich w are washed with a solution containing S SC lrn tung&ate and trichloroacetic acid ( ). The specificity of f randomly-ordered for amino acid incorporation into has been studied extensively with A summary of the m n in Table Q&.`+y$+.~'.q~ ag d of base uec-~+~~-s7- templatesfor ale amino acid/ 8 Little template activity % detected with poly G; pre- sumably G-G interactions inhibit the template activity of RNA (discussed in a later section). A polynucleotide with two kinds of bases contains eight triplets; six -riplets with two kinds of bases, and two triplets with one q&f&p ecu cut ucc Three preparations, for four amino acid P&L 5 v templates preparations ( ~01~ VA> ~01~ U,G, and poly C$,&rve~as~~emplate( for six amino acids *Fe+! T P# $+L!ab &;&;-.#&. i -There-~~e IS\ fouripolynucleotides@ three kinds of bases&q pOlY UAG, ~01~ UCG, poly UCA, and poly CAG. A bolynucleotide with three 901~ UAG, resembles a mixture of seven poly AG, poly UAG. UA, poly UG, one kind of base, eighteen %+4+&s with two kinds of bsses, and six &+++t,s with three kinds of bases,e Each three-base polynucleotide J: /. `.&- `.', &.iq-/`&$. .&, \ into protei ' of ten o+~ore. amino acid ' .--____ ';e . .--" .._, q ~,,~~~~~-- -- accounted for ,~ 1---- 1'3 methionine and,?aspartic acid' and--~Qc?~CAG i !. 4. .J 'i. // L6 t was a template for serine codons were essignedfc bases which -2-. i L< "+ythe -a\ f$ With additional data it is possible to&*v w bases?- codon/ as well as the kinds of bases are present. Base compositions of RNA codons are derived as follows: he base composition of a polynucleotide adetermined frequency of each doublet or triplet is calculated easily once the base ratio of a randomly-ordered polynucleotide is known. By synthesizing a series of polynucle d tides, each containing identical bases, ; * 'A- %%ning the relative but w+th different proportions bf-trctsee, and/, proportions of amino acids directed polynucleotide, ."I ,.I L *__;_._ -base compositionJof e codon, as well as the number 22+&&@& of nucleotides per codon-tie estimated. polynncleotide preparation' efficiency of each preparation may be influenced strikingly by factors other than base composition, such as, the conformation of the RNA in solution, the presence hof terminal phosphate, i-ts--molecu=lar--weight~ the number of base residues per molecule and so forth. These factors will be discussed in later b % $&&Q$~. .y .~ --* I* sections. However,j\th ' amount of each'amino acid incorporated into protein (j%sx.Au~a. A.! due to the addition of aApolynucleotide preparationji~&dCtermFn& e - Table tlf show2 an example of data obtained with a poly AC preparation; similar data were obtained with four other poly AC preparations, each d.5f~feBzntt,~ih base ratio ( ). The four possible doublet permutations do not contain enough specific information to code for the six amino acids directed into protein by poly Amwhereas, the infor- mation content of the eight possible triplets is adequate. If every .,plet were read, some amino acids would respond to two or more codons. In such cases the sum of the triplet frequencies would then be compared with the corresponding amino acid incorporation data. For example, if (ACA) and (ACC) both corresponded to the same amino acid, the sum of their frequencies would be 24.9 percent, which could not be distinguished from the frequency of the doublet, (CA), which is also 24.9 percent. The relation between theoretical frequency and the experimentally determined frequency of amino acid incorporation into protein is shown (j&?&J&) in Fig. The.-.dat~demons~.~a~-` L . k istidine, asparagine and <&q :c$.~$ &p-jy ,'-' ~: .- . ._ 1 ,, j,. .;- j, ,;.I ;, , glutaminelcomposi`tion of a his"' '* :' Llcrlne codon is (CAC)A an asparagine codon, (AAC) and a glutamine codon, (CAA). Threonine responds either to x triplets, one of base composition (ACA)qthe other (ACC), or to the doublet (AC). As shown in Fig. - , proline responds to two triplets, CCC and (CCA) or to the CC doublet, and I@@ lysine responds to the triplet AAA. W.S %;- ._ , wa-y'- ery triplet base wn poly AC was assigned to an amino acid as follows: Proline ccc (CCA) (CAC) (ACC) (ACA) (CAA) liistidine Threonine Glutamine Asparagine Lysine MAC) AAA In this way the nucleotide compositions of approximately 50 ccdonis were estimated by Ochoa ( 1, Nirenberg ( > and their coworkers. &$&:- _ : `? A summaryj\i s -2 \ ,TL I .: shown in Table . Tentative base compositions were estimated for many ;,&;. 2, 1 codons contai ning three different bases.. Most amino acids were found to be `$., CLpQG) #`,: I' `! `1 ../ coded by multiple word a. Since synonym codons often differ by only one base, -G `< -4, the bases which were common to each synonym codon were assumed to occupy the same posit ion within each triplet. Simi l&r results were obtained in both laboratories I although extracts ,/----- -.., were prepared from E. coli B in the Ochoa laboratory and from E. COli * w3100 -24- (a K12 strain) in the NIH laboratory. Each of the 64 trinucleotides have been synthesized-and assayed tempiate)k binding of E-. coli AA-tRNA. Since the initial studies showed that AA-tRNA for some amino acids binds to ribosomes in response to trinucleotides at 0.02-0.03 M Mg*, but not at 0.01 M Mg*,: (Nirenberg and Leder, 1964; Leder and Nirenberg -*. .^ .._...... .--.,..- .* ..,_ , PNAS)/,,/ --.-., .~ .____. '-- '-..._ __-_ /."-l \ a relatively high Mg* concentration i 0.03 Mj was selected for Az@-pQp~<--+ / the initial survey of trinucleotide-ribosome-AA-tRNAd\ Ii- All responses found at 0.03 M Mg* then were reassessed at 0.01 and 0.02 M Mg*. Summaries of responses of unfractlonated E,- coli AA-tRNA are ', .- _., : / , '-.`,: , Iu,,,:- t. ;. ; ,. ./' ,,' .,, ,,i shown in Tables and . $L- / . I I'.. . . -25 a- Most trinucleotides have been assayed for template specificity ? yJ-v"8 with 20 AA-tRNA 4 reparations from E-. COli, each acyiated with one radioactive and 19 unlabeled amino acids ( >. In surveying trinucleotide specificity, unfractionated AA-tRNA 7 &d *. used i * 4 3l 'tially, because&species of tRNA compete with one another during the formation of AA-tRNA-codon complexes and the specificity of codon recognition can be altered by changing the concentrations of two or more species of tRNA. c Ainost ail triplets ' \\ correspond to amino acid'., e-e- ;i Synonym codons Y' LO -CI be logicall_y related to one another, & most cases, erynanptffee6arrs differ only in the base occupying the third posPtion of the triplet. Only four unique patterns of degeneracy-were found, each pattern determined by the kinds of bases which occupy the third positions of synonym triplets. Patterns of alternate third bases are: . . . 1) G 2) u=c 3) A=G 4) U=CzA A f',fth pattern, U = C = A - G, e which is not necessarily uniq:e, because this pattern would result if two simpler patterns were present, such as [(U = C) + (A = G)] or [(U = C = A) -I- (G)]. Codons specifying the initiation of protein synthesis may contain alternate bases at the first rather than the third position of the codons. For example, N-formyl-Met-tRNA responds to AUG, GUG, and possl>`iy also UUG (discussed under Punctuation). leading to single base replacements in DNA at sites corresponding to third bases of mRNA codons may not result in amino acid re- \I placement in protein, Il Eence, many mutations Hrt5s are sc:ent. The code appears to be arranged so that the e?zcts of base re- placements in DvA)or erroneous translation of a base in mRNAJof;,~~ is minimized, Possible amino aci.d replacements in prozein which would OCSIX: as a result of single base changes can be read in Tabie ';jr moving horizontally or vertically from the amino acid similar to Glu-codons, GM and GAG; Ser-codons are related to%&- codons, and so forth. \ L -1 - -. m 46 of the 53 codon base comfiositions)\which i i had been assigned on the basis of studies with synthetic, randomly-ordered polynucleotides and the cell-free pro'iein sycthesizing systemsi(-l~`-of.-ihe, \jllZ hLlI1Jl-~!d different 5'- terminal doublet followed by approximately v C residues, such as AU(C) &&++&=-" c I__r_^__ I_.^`-_-*. --- (Matthaei). The results are shown in Table -. Rach F&A preparation +# db binding of Pro- and Ser-tRNA. Ala-tRNA,,responded to most of the polynucleotides,&Zi& #)`(&&( `. The -&&I-respor&$ of Pro-tRNA to every polynucleotide demonstrates a high jut l;:M,. ..A+ l&a-a The very- high response 4 of Ser-tRNA to polynucleotides .%sz~-un- expected. Each RNA preparation contains three triplets, depending upon _. ,.' /' . the phase of codon recognition. P .' OlY For examp e,/UUCCC(C)lo, contains the triplit s WC, UCC, and CCC>accordin g to the reading phase and stimulates ,' and Ala-tRNA to ribosomes. In some cases, bases one to three are recognized preferentiaily; in other cases, bases two `co four are preferred. Therefore , phasing preferences depend '-,on the triple-t rather than the exac@ location of the triplet in the 5'-terminal region. c-c.onsi codons tested are sufficiently high m that base sequence assignments can be derived readily: o\lr- : ,Rowever, interpretation of the remaining data is complicated by the phasing problem. Responses of AA-tRNA to polynucleotides agree well with the responses of AA-tRNA to made that almost every possible to a greater or lesser extent. _ Powever $ three differences in response of AA-tRNA to ?oiynucleotides and trinucleotides should be noted; Gly-t?aA does not respond to GGC(C)lOO, but does respond to the trinucleotide, GGC; Le..-- c tRNA does not respond to P 'LJC(C)lCG but does respond to randomly-ordsred .~ pol$ UC,j al.-2 ' )&dfzficrl; i' sugh responses to the trinucleotide/CUC f to detect with unfractionated Leu-tRNA ( of Ser-tRNA to x& pol;.Aticlectides $e not observed with trinucleotide#.= - ,TI.. &e*bQ _, /QL--. t,do;yaucleotfdes with repeating doublet, triplet, or tetramer sequences --dL.vL ) and B used to n stimu- da& -~a.- B.9 late &aFno acid incorporation in E. coli extracts. a Fthz res,,L",s, 7 . e RNA preparations which do not contain an inkiat;: codon, such as 4 AUG/or GUG)are translated in almost every Tossible phase during protein synthesis in E. coli extracts, RNA with a repeating doublet sequence contacns two tsiplets in alternating sequence, and tnerefere is a template for amino acids in alternating sequence. Most RNA preparations with a repeating triplet sequence are read in three phases,each phase corresponding to a different triplet. For example, poly WC st-imulates the irxorpcratlon of rsdioac:lve pherAylsZacine, serir,e and leucine into proteir@&erefore7poly UUC resembles a mixture of poly 'UUC, poly UCU, and poly CUUg if the reading phase were (-UUC*UUC-)n, the protein product would be polyphenylalanine; if the/$Gq&! readingjwere (-UCU*UCU-),, the product would be polyserrne; and if the reading phase were (-CUU*CUU-),, .-ne expected product would be polyleucine. A polynucleotide with a repeatin g tetranucle four tr-lplets and therefore serves as a template for tetrapeptide sequences. Polymers which stimulate incorporation of less the;-: GrnJ-%= - Y the expected number of amino acids contain terminator Q&+!!%, such as UA4 H J &UAG, or WUGA (discussed under Punctuation). .#fi' -- ire-f@ . Results obtained with polynucleotides containing repeating'sequences A directly demonstrate nucleotide sequences of terminator codons. Polynucleo- tides without initiator codons are translated in almost every possible phase and the prefe,, ---ed mode of phasing apparently depends upon the triplet rathrr than the exact distance of the tripl&t from the 5'-terminus of the poly- the reading phase of tiNA is known and is correlated with the phase of amino acid incorporation into protein. mino acid is shown in Table -. emaining amlno acids. : ;t should be rz';ed t one codon corresponds to tryptophati and one to methionin