Lecture 6 (2015)Apunte Inglés
Vista previa del texto
Ingrid Guarro Marzoa
Lecture 6: gene control by DNA-protein interactions
Introduction-How does a cell determine which of its thousands of
genes to transcribe?
As we’ve seen, the transcription of each gene is controlled by a regulatory
region of DNA relatively near the site where transcription begins (promoter
region). Some regulatory regions are simple and act like switches thrown by a
single signal but many others are more complex. Whether complex or simple,
these switching devices are found in all cells and are composed of two types of
1. Short stretches of DNA of defined sequence
2. Gene regulatory proteins that reorganize and bind to this DNA
The outside of the DNA helix can be read by proteins:
Gene regulatory proteins must recognize specific nucleotide sequences
embedded within the DNA structure. When we look at an alpha-helix protein,
we can see that there are different radicals of the different amino acids that
compound the protein. These amino acids have specific radicals and are able to
contact with DNA sequence information. We have to take into account that the
edge of each base pair is exposed at the surface of the double helix, which
have different residues (chemical groups) that can make chemical interactions
with the protein’s radicals, mostly through hydrogen bonds. The residues are
o Chemical groups with slightly positive charges (e- acceptor, H+
o Chemical groups with slightly negative charges (e- donor, h+
o Chemical groups that create hydrophobic interactions (methyl
Proteins in both the major and minor groove* can recognize all these groups.
The major groove is much more informative than the minor groove so proteins make specific contacts with the major groove. In the minor groove, because of the number of residues that are able to accept electrons (there is 1 at each side) the protein is not able to differentiate the nitrogen bases.
*major and minor grooves: in the major groove the nitrogen bases backbones are far apart while in the minor groove they are close together. The grooves twist around the molecule on opposite sites.
2014/2015 Molecular biology Ingrid Guarro Marzoa Example: If my protein has to detect a G, in the major groove it would not have any problem because the protein is not going to confuse the G with a C because each of the four-base par configurations offers a unique pattern of features but in the minor groove the protein cannot differentiate between G and C because the patterns are similar.
As in the major groove the pattern in G is: red-red-blue-white and the pattern for C is white-blue-red-red, there’s no confusion.
As in the minor groove the pattern in G is: red-blue-red and the pattern for C is red-blue-red, there can be a confusion.
Like this, a specific nucleotide sequence can be “read” as a pattern of molecular features on the surface of the DNA double helix.
• Helix-turn-helix (HTH) The first DNA-binding protein motif to be recognized was the helix-turn-helix (HTH). This domain is present in many proteins from bacteria and also in our cells. It is constructed from two alpha-helices connected by a short extended chain of amino acids, which constitutes a “turn”.
o Recognition helix: is the one that fits into the major groove, so it will read the sequence and make interactions with the nitrogen bases. Makes specific interactions.
o Stabilizing helix: interacts with the phosphate groups and the deoxyribose. These interactions are not specific and can happen in any region of the DNA. They are used to stabilize the interaction between the protein and the DNA and don’t depend on the sequence of DNA.
2014/2015 Molecular biology Ingrid Guarro Marzoa If we calculate how many places in a genome can a protein bind to an specific sequence of DNA, the probability would be (¼)n x 3GB (being n the number of bases on DNA interacting with the protein and taking into account that DNA has 4 different nitrogen bases). So, in our genome a sequence of 5 bases would be found 3 million times in average. This means that the HTH is not specific enough.
A protein can interact with a specific sequence because it is able to recognize specific bases. If a sequence is very frequent, the transcription factor will go to too many regions in our genome as to regulate a reduced number of genes.
To avoid this, proteins make dimers that interact with palindromic sequences of DNA.
The probability of a 10-base sequence is 10 -6 and it would be present in our genome only about 3000 times on average.. So with the formation of a dimer, the specificity is increased.
The protein that forms the dimer is the same as the first one but it’s rotated 180º to read the palindromic sequence.
Resume: if there was only a protein, there would be too many places (too many same sequences) in which the protein could bind, so that’s why a dimer is needed to increase the number of bases in the sequence and the specificity.
• Leucine zipper: This leucine zipper is responsible for dimerization and also for DNA binding and it is named like this because of the way the two alpha-helices (one from each monomer) are joined together to form a short coiled-coil. The helices are held together by interactions between hydrophobic amino acid side chains (often on leucines) that extend from one side of each helix. Leucines are very hydrophobic so they can make a zipper. Thanks to this zipper that unites two 2014/2015 Molecular biology Ingrid Guarro Marzoa protein monomers, the dimer can recognize many more bases on DNA. This is also a way to increase variability because the interaction between the two protein helices is not specific as long as they both contain a leucine zipper.
Both leucine zipper and HTH allow the cells to originate molecules to control hybrid sequences.
• Zinc fingers: The Zn finger is quite common in our genes. It forms a structure in which two alpha-helices are packed together with zinc atoms. The Zn is a very important component that is coordinated by cysteine and histidine. The amino acids that are in the between position 6 and 19 create a finger (diagram). Proteins that interact with DNA don’t contain only one finger; they have many fingers to make contact with different sequences along long regions of DNA. Like this, our transcription factors can recognize long sequences of DNA. As usual, the radicals of the amino acids make contact with the bases mostly in the major groove of DNA.
• How genetic switches work: We can differentiate two different types of gene considering the type of transcriptional control: o Negative control: the specific transcription factor acts as a repressor.
This kind of control is frequent in bacteria 2014/2015 Molecular biology Ingrid Guarro Marzoa (prokaryotes). It is a protein that is able to bind DNA very close to the promoter and by binding DNA it prevents RNA-polymerases to do their job. They PREVENT transcription.
o Positive control: proteins have a positive function. They are activators of specific transcription factors. These are more frequent in eukaryotes. They ACTIVATE transcription.
These control systems are not exclusive; genes can be controlled by these two systems at once.
Why do our cells mostly use positive control? For our cell it is cheaper to activate the genes that a particular cell type needs (a hepatocyte for instance) than inhibit all the other genes that the cell doesn’t need to express (i.e. the hepatocyte does not need those genes specific for neurons, skin cells, blood cells,… ).
Why do bacteria mostly use negative control? Bacteria are unicellular organisms so one cell has to make all the functions and usually expresses many of its genes. Bacteria have to react in seconds to the environment, so a negative control is quicker because you only need to generate a repressor instead of generating all the transcription factors that activate transcription.
• Example of negative control system in E.coli: In this case, we study how does E.coli reacts in front of lactose (lac operon).
The lac operon is a paradigm among negative control systems in prokaryotes.
The operon is a group of genes expressed from the same promoter (P) by means of an mRNA that codes for all of them. In this case the lac operon codes for the genes that create the following proteins: o Beta-galactosidase o Permease o Transacetylase These proteins are used by E. coli to convert lactose into glucose to obtain energy from it.
If the bacteria feels that there’s no lactose in the medium, the genes that codify for the proteins used, are not expressed. But when there is lactose in the medium, the genes that codify for the proteins are expressed.
2014/2015 Molecular biology Ingrid Guarro Marzoa Process: there is a DNA sequence called operator that is bound by the repressor protein (in this case Lacl) and has a symmetry axis (palindromic).
This sequence overlaps with the promoter sequences.
o Repressed state: the repressor acts like a tetramer. This tetramer binds the operator using two monomers (a dimer as the ones described before). These two monomers block transcription.
o Induced state: the repressor has a domain (cavity) that can bind lactose. When lactose binds to the repressor, it induces a conformational change into the repressor. This change causes the unbinding of the repressor from DNA and consequently RNA polymerases are free to bind the promoter and start transcription.
• Tetramer structure of repressors: If the repressor was a monomer the presence of lactose would cause an continuous increase (red) of the repressor+lactose complex until saturation.
With small quantities of lactose, there would be small quantities of repressor+lactose complex (it would be proportional) and for the cell this would not be efficient because the genes that are implicated on the utilization of lactose would be activated, consuming energy constantly. To avoid this, bacteria have tetramers. Lactose binds to the tetramer structure in a different way, following as a sigmoidal graph (green). This is called allosterism and is a key property. With this system, small quantities of lactose will not activate the operon. Like this, bacteria will not spend a lot of energy producing the necessary proteins and will only produce proteins if the lactose concentration is high enough .
2014/2015 • Molecular biology Ingrid Guarro Marzoa Example of positive control system inE. coli: The lac operon is not only controlled by this negative control, it is also under the control of a positive control where some other proteins take a role.
In this case, another protein called CAP is able to bind DNA, but only in the presence of cyclic AMP (cAMP).
The presence of glucose will activate a molecule that is able to reduce the cAMP levels. If there is glucose the cell will activate a molecule that reduces the levels of cAMP and in consequence, CAP will not bind DNA. Without CAP, the RNA-polymerase can’t bind the DNA and it can’t start transcription.
We can deduce that in the lac operon the presence of glucose inactivates the expression of the genes and the presence of lactose activates the expression of genes.
• Cell situations: 1. Presence of glucose and lactose: Although lactose inactivates the repressor, cAMP levels are low and CAP is inactivated, so RNApolymerase can’t bind DNA and genes are not expressed. When there is glucose, the cell doesn’t want to spent energy creating the enzymes for lactose so there will not be expression of the genes.
2014/2015 Molecular biology Ingrid Guarro Marzoa 2. Presence of glucose but not of lactose: CAP is inactivated (RNApolymerase can’t bind DNA) and the repressor is activated (RNApolymerase can’t bind DNA). There would not be expression of the genes. When there is glucose, the cell can obtain energy from it and as there is no lactose it’s not necessary the production of the enzymes.
3. Absence of both glucose and lactose: CAP is activated (RNApolymerase can bind DNA) but the repressor is activated (RNApolymerase can’t bind DNA). There would not be expression of the genes. In this situation the cell is going to obtain the energy from another source that would require expression of other genes.
4. Presence of lactose but not of glucose: CAP is activated (RNApolymerase can bind DNA) and the repressor is inactivated (RNApolymerase can bind DNA). This is the only situation where there would be expression of the genes.
• E-coli sigma factors E-coli has several sigma factors (proteins capable of recognizing promoters with different consensus sequences), although sigma 17 is the general one.
There are some sigma factors that are only activated in front of a heat shock, so the genes that depend on the activation of sigma factors will only be expressed if there is a heat shock. The heat shock sigma factors at their basal conformation can’t bind DNA but if the temperature increases, they can bind DNA and work as sigma factors.
The group of genes regulated by the sigma factors are called regulons.
Regulons are groups of different genes each of them with different promoters but they all have the same sigma factor -35 and -10 sequences.