Science is complex. It’s understandable that specialists will develop a jargon, a specialized language of their discoveries. But this forms a frustrating and completely unnecessary barrier to public understanding. In this series of posts I explain fundamental processes in molecular biology, the better to breach that jargon-wall.
Gene expression is one of the most important processes in the biology of a cell. It is the process that ensures that the right proteins are present in the right amount at the right time, so the myriad different cell types in our bodies can carry out their functions. The end goal of gene expression is typically (though not always) the production of proteins. A DNA strand is a chain of nucleotides strung together; two DNA strands wind together to form a DNA molecule. A gene is section of a DNA strand, consisting of a sequence of nucleotides. A gene is expressed when the cell’s molecular machinery ‘reads’ the sequence to build a protein. Gene expression happens in two stages; first, an RNA molecule is transcribed from the DNA double helix. Next, this RNA molecule is translated when it is used as a template to assemble amino acids in a chain; these linked amino acids form a protein.
Transcription is the first step in gene expression, during which a particular segment of the DNA (a gene) is copied into messenger RNA by the enzyme RNA Polymerase.
Transcribing the code
Like DNA, RNA is a linear molecule made up of nucleotides linked together. Like DNA, RNA is a chain of the base molecules Adenine (A), Guanine (G), and Cytosine (C), and (instead of the Thymine of DNA) Uracil (U). DNA is a double stranded helix, whereas RNA is typically a single-stranded molecule, unless it folds on itself or associates with proteins to perform structural or catalytic functions.
Transcription begins with the unwinding of a small region of the DNA double helix, exposing the two DNA strands to the ‘transcription machinery’ of the cell. One of the DNA strands acts as the template for the RNA molecule that will be synthesised. Once formed, the RNA strand is displaced and the DNA double helix re-forms itself. Because RNA strands are copied from short, limited regions of DNA they are much shorter than DNA molecules. For example an average RNA molecule is only a few thousand nucleotides in length while a DNA molecule can be several million nucleotides long.
Just as DNA replication is performed by the enzyme DNA Polymerase, transcription is carried out by RNA Polymerase enzymes. The RNA polymerase moves along the DNA double helix, unwinding it and building a new single strand of RNA in its wake. Several RNA polymerases can attach to the same DNA molecule, one behind the other, and thus thousands of these RNA transcripts can be formed per cell per hour. It is a beautifully mechanistic process in which the RNA transcripts are mass produced as necessary to meet the demands of the cell as it responds to its environment.
The vast majority of the genes in our genome specify the amino acid sequence of proteins. The RNA molecules that are copied during transcription as described above are known as messenger RNA molecules, or mRNAs. It is important to remember that not all RNA molecules are mRNA molecules (#notallRNAs); some RNA molecules do not code for protein at all, and instead perform structural or catalytic functions. Gene expression stops at the transcription stage for these RNA genes.
A DNA molecule in a human chromosome is huge; several million nucleotides in length. Obviously it is necessary for the RNA polymerase to recognise where it should bind to, on this vast molecule, in order to initiate transcription. The promoter region of a gene is a small sequence of nucleotides that is upstream of the gene. This promoter region acts as a ‘landing site’ for RNA polymerase and other proteins known as transcription factors, which help the RNA Polymerase bind to the promoter. Other proteins (including ones known as chromatin remodeling complexes and histone acetyl transferases which I will describe in more detail in a later article on Epigenetics) are also assembled at the promoter region, and together all these proteins form a complete transcription initiation complex. Once the transcription initiation complex has been formed, the RNA polymerase shifts to ‘elongation mode’, which essentially means it begins to elongate the RNA transcript using the DNA as a template as it moves along the DNA strand. Once RNA polymerase begins elongation the transcription factors are released from the promoter region, free to initiate a new round of transcription with a new RNA polymerase molecule.
The newly formed RNA transcript requires further processing before it is ready to act as a template for assembling amino acids to form a protein molecule. A typical gene has many short regions of protein coding sequence (exons) that are separated by long non-coding regions (introns). The newly formed RNA transcript has both intron and exon sequences, and it must therefore be processed to remove these intronic regions. This process is known as RNA splicing. It seems wasteful to transcribe parts of a message only to discard it soon after. But splitting a gene into exons, scattering a protein’s blueprint in discrete regions of code within the DNA sequence, allows the cell to synthesise several different proteins from the same gene, by ‘shuffling the exons’ in different combinations through a process known as alternate splicing.
Once processed, the RNA transcript (now called messenger RNA) is transported out of the nucleus into the cytoplasm of the cell, where it is translated into the amino acid sequence that, linked together, forms its protein.
For a great visual summary of this process, check out this short YouTube video:
Transcription is the first and sometimes only step in gene expression. It is how the genetic material encoded within DNA is utilised by the cell to express genes. This is a fundamental process which provides the cell the ability to control its structure and function, and is the basis for the adaptability that allowed our cells to survive and evolve for billions of years.