Biologia w szkole średniej
How is the information in an mRNA sequence decoded to make a polypeptide? Learn how groups of three nucleotides, called codons, specify amino acids (as well as start and stop signals for translation).
Have you ever written a secret message to one of your friends? If so, you may have used some kind of code to keep the message hidden. For instance, you may have replaced the letters of the word with numbers or symbols, following a particular set of rules. In order for your friend on the other end to understand the message, he or she would need to know the code and apply the same set of rules, in reverse, to figure out what you had written.
As it turns out, decoding messages is also a key step in gene expression, the process in which information from a gene is used to construct a protein (or other functional product). How are the instructions for building a protein encoded in DNA, and how are they deciphered by the cell? In this article, we'll take a closer look at the genetic code, which allows DNA and RNA nucleotide sequences to be translated into the amino acids they represent.
Overview: Gene expression and the genetic code
Geny dostarczają instrukcji do ekspresji białek w dwuetapowym procesie.
- In transcription, the DNA sequence of a gene is "rewritten" using RNA nucleotides. In eukaryotes, the RNA must go through additional processing steps to become a messenger RNA, or mRNA.
- In translation, the sequence of nucleotides in the mRNA is "translated" into a sequence of amino acids in a polypeptide (protein or protein subunit).
Cells decode mRNAs by reading their nucleotides in groups of three, called codons. Each codon specifies a particular amino acid, or, in some cases, provides a "stop" signal that ends translation. In addition, the codon AUG has a special role, serving as the start codon where translation begins. The complete set of correspondences between codons and amino acids (or stop signals) is known as the genetic code.
In the rest of this article, we'll more closely at the genetic code. First, we'll see how it was discovered. Then, we'll look more deeply at its properties, seeing how it can be used to predict the polypeptide encoded by an mRNA.
Code crackers: How the genetic code was discovered
To crack the genetic code, researchers needed to figure out how sequences of nucleotides in a DNA or RNA molecule could encode the sequence of amino acids in a polypeptide.
Why was this a tricky problem? In one of the simplest potential codes, each nucleotide in an DNA or RNA molecule might correspond to one amino acid in a polypeptide. However, this code cannot actually work, because there are amino acids commonly found in proteins and just nucleotide bases in DNA or RNA. Thus, researchers knew that the code must involve something more complex than a one-to-one matching of nucleotides and amino acids.
In the mid-1950s, the physicist George Gamow extended this line of thinking to deduce that the genetic code was likely composed of triplets of nucleotides. That is, he proposed that a group of successive nucleotides in a gene might code for one amino acid in a polypeptide.
Rozumowanie Gamowa było takie, że nawet podwójny kod ( nukleotydy na aminokwas) nie będą działać, ponieważ pozwoli to tylko na uporządkowanych grup nukleotydów (), za mało, aby pokryć standardowych aminokwasów wykorzystywanych do budowy białek. Kod oparty na trypletach nukleotydów, wyglądał jednak obiecująco: dostarczyłby unikalnych sekwencji nukleotydów (), więcej niż potrzeba do pokrycia aminokwasów.
Nirenberg, Khorana, and the identification of codons
Hipoteza trypletu Gamowa wydawała się logiczna i została powszechnie zaakceptowana. Jednakże nie została udowodniona eksperymentalnie i badacze nadal nie wiedzieli, które tryplety odpowiadają jakim aminokwasom.
Złamanie kodu genetycznego rozpoczęło się w 1961 roku pracą amerykańskiego biochemika Marshalla Nirenberga. Po raz pierwszy Nirenberg i jego współpracownicy byli w stanie zidentyfikować tryplety nukleotydów, które odpowiadały odpowiednim aminokwasom. Ich sukces opierał się na eksperymentalnych innowacjach:
- Sposobie na stworzenie sztucznych cząsteczek mRNA ze specyficznymi, znanymi sekwencjami.
- Systemie przepisywania mRNA na polipeptydy poza komórką (system "bezkomórkowy"). System Nirenberga składał się z cytoplazmy pękniętych komórek E. coli, która zawierała wszystkie składniki potrzebne do translacji.
First, Nirenberg synthesized an mRNA molecule consisting only of the nucleotide uracil (called poly-U). When he added poly-U mRNA to the cell-free system, he found that the polypeptides made consisted exclusively of the amino acid phenylalanine. Because the only triplet in poly-U mRNA is UUU, Nirenberg concluded that UUU might code for phenylalanine. Using the same approach, he was able to show that poly-C mRNA was translated into polypeptides made exclusively of the amino acid proline, suggesting that the triplet CCC might code for proline.
Other researchers, such as the biochemist Har Gobind Khorana at University of Wisconsin, extended Nirenberg's experiment by synthesizing artificial mRNAs with more complex sequences. For instance, in one experiment, Khorana generated a poly-UC (UCUCUCUCUC…) mRNA and added it to a cell-free system similar to Nirenberg's. The poly-UC mRNA that it was translated into polypeptides with an alternating pattern of serine and leucine amino acids. These and other results unambiguously confirmed that the genetic code was based on triplets, or codons. Today, we know that serine is encoded by the codon UCU, while leucine is encoded by CUC.
Do 1965 roku wykorzystując system pozakomórkowy i inne techniki, Nirenberg, Khorana i ich współpracownicy odczytali cały kod genetyczny. To znaczy, zidentyfikowali aminokwasy i sygnały "stop" odpowiadające każdemu z kodonów. Za ich zasługi Nirenberg i Khorana (wraz z innym odkrywcą kody genetycznego Robertem Holleyem) otrzymali nagrodę Nobla w 1968 roku.
Properties of the genetic code
As we saw above, the genetic code is based on triplets of nucleotides called codons, which specify individual amino acids in a polypeptide (or "stop" signals at its end). The codons of an mRNA are “read” one by one inside protein-and-RNA structures called ribosomes, starting at the 5’ end of the gene and moving towards the 3’ end. Let's take a closer look at the genetic code in the context of translation.
Types of codons (start, stop, and "normal")
Translation always begins at a start codon, which has the sequence AUG and encodes the amino acid methionine (Met) in most organisms. Thus, every polypeptide typically starts with methionine, although the initial methionine may be snipped off in later processing steps. A start codon is required to begin translation, but the codon AUG can also appear later in the coding sequence of an an mRNA, where it simply specifies the amino acid methionine.
Once translation has begun at the start codon, the following codons of the mRNA will be read one by one, in the 5' to 3' direction. As each codon is read, the matching amino acid is added to the C-terminus of the polypeptide. Most of the codons in the genetic code specify amino acids and are read during this phase of translation.
Translation continues until a stop codon is reached. There are three stop codons in the genetic code, UAA, UAG, and UGA. Unlike start codons, stop codons don't correspond to an amino acid. Instead, they act as "stop" signals, indicating that the polypeptide is complete and causing it to be released from the ribosome. More nucleotides may appear after the stop codon in the mRNA, but will not be translated as part of the polypeptide.
The start codon is critical because it determines where translation will begin on the mRNA. Most importantly, the position of the start codon determines the reading frame, or how the mRNA sequence is divided up into groups of three nucleotides inside the ribosome. As shown in the diagram below, the same sequence of nucleotides can encode completely different polypeptides depending on the frame in which it's read. The start codon determines which frame is chosen and thus ensures that the correct polypeptide is produced.
To see what reading frame is, it's helpful to consider an analogy using words and letters. The following message makes sense to us because we read it in the correct frame (divide it correctly into groups of three letters): MOM AND DAD ARE MAD. If we shift the reading frame by grouping letters into threes starting one position later, however, we get: OMA NDD ADA REM AD. The frameshift results in a message that no longer makes sense.
An important point to note here is that the nucleotides in a gene are not physically organized into groups of three. Instead, what constitutes a codon is simply a matter of where the ribosome begins reading, and of what sequence of nucleotides comes after the start codon. Mutations that insert or delete a single nucleotide may alter reading frame, resulting in the production of a “gibberish” protein similar to the scrambled sentence in the example above.
One amino acid, many codons
As previously mentioned, the genetic code consists of unique codons. But if there are only amino acids, what are the other codons doing? As we saw, a few are stop codons, but most are not. Instead, the genetic code turns out to be a degenerate code, meaning that some amino acids are specified by more than one codon. For example, proline is represented by four different codons (CCU, CCC, CCA, and CCG). If any one of these codons appears in an mRNA, it will cause proline to be added to the polypeptide chain.
Most of the amino acids in the genetic code are encoded by at least two codons. In fact, methionine and tryptophan are the only amino acids specified by a single codon. Importantly, the reverse isn't true: each codon specifies just one amino acid or stop signal. Thus, there's no ambiguity (uncertainty) in the genetic code. A particular codon in an mRNA will always be predictably translated into a particular amino acid or stop signal.
The genetic code is (nearly) universal
With some minor exceptions, all living organisms on Earth use the same genetic code. This means that the codons specifying the amino acids in your cells are the same as those used by the bacteria inhabiting hydrothermal vents at the bottom of the Pacific Ocean. Even in organisms that don't use the "standard" code, the differences are relatively small, such as a change in the amino acid encoded by a particular codon.
A genetic code shared by diverse organisms provides important evidence for the common origin of life on Earth. That is, the many species on Earth today likely evolved from an ancestral organism in which the genetic code was already present. Because the code is essential to the function of cells, it would tend to remain unchanged in species across generations, as individuals with significant changes might be unable to survive. This type of evolutionary process can explain the remarkable similarity of the genetic code across present-day organisms.