Tema 5 Genòmica (2017)Apunte Inglés
Inclou els apunts del tema 5 amb les il·lustracions pertinents corresponents a l'assignatura de Genòmica. Evolution by Gene Duplication.
Vista previa del texto
Natalia Mingorance García
3r Biologia – UdG
TEMA 5: Evolution by Gene Duplication
Basic model of generation genetic variability
How the genome changes the basic model of gene evolution is mutation. These
sequences can change by mutation. These mutations to be fixed to these sequences
have to pass a filter which is a combination of selection or a genetic drift. When
mutations pass by this filter then we have a new DNA sequence or a changed DNA
sequence. So starting with one function then through this mechanism or cycle we can
end that this DNA is able to have a new function. Most of the mutations are deleted in
this step but some of them can pass the filter and have a new sequence. Concretely,
SNP (single nucleotide polymorphism) are the most abundant mutation in our genome.
Blue part of the diagram, there is concept of diversity here, which is the number of species or genera. If you look at the evolution you can see the Cambrian explosion.
This is an explosion of species (500 Myr). We have a lot of new species. The diversity suddenly increase, the number of species exploded. In a very short period we have a huge increase of forms and functions. So to explain that, to explain this diversity, disparity we need a mutation pattern able to produce all these increase of species. In other words, the Cambrian explosion is caused because of changes in the genome.
We need a new model of mutation to explain this explosion.
Modified model of generation genetic variability 1 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA We start with a DNA sequence in the previous model but we have mutations of different kinds (not only SNP) but also duplications. Insertions, deletions but then one of the main mechanism to explain this Cambrian explosion is because this gene or genome duplications which is probably the most important mutation to generate new functions duplications.
Now we are going to talk about gene duplications. In the earliest of last century Haldane introduced the concept of duplication as one of the most important mechanisms for having new functions and he said that a redundant duplicate of a gene may acquire divergent mutations and eventually emerge as a new gene.
They didn’t know that a gene is inside the DNA. Susumo Ohno (1970) was the father of this duplication theory and he said that natural selection merely modified, but redundancy created.
Gene duplication as one of the mechanisms to modify the genome One of the first evidences of that was a reported by bridge in the earliest of last century with a duplication of a section of a chromosome on Drosophila which is called bar. The duplication of this section of a chromosome reduces the size of the eye. That was the first evidence that duplication has some kind of phenotypic event. For sure, there are some genes involved in these changes.
After that, we have seen that there are a lot of presents of gene duplications in all organisms. Gene duplications in genomes. In Homo sapiens about 40% of our genes are duplicated. Only protein coding genes. The average rate of duplication is one gene every one hundred million years. A new duplication event emerges every one hundred million years successfully, after selection and drift.
Which are the mechanisms of having gene duplications? 1- Non-homologous recombination (meiosis) this non homologous recombination, the different kinds all of them produce tandem duplications.
2- Replication error replication slippage. All the replication machinery or proteins jump from one strand to another strand and in this case you produce inverted tandem repeats. It comes from another strand of DNA.
We have other mechanisms of producing duplications and the most important here is: 2 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA 3- 3’ transduction related to the LINEs weak poly-A signal. (a SINE is a kind of retrotransposon and some of them have the poly-A signal which is weak, shorter). If that happens we can have this 3’ transduction.
The other two mechanisms are too similar based on capture fragments when there is a break in the DNA.
4- Fragment capture (NUMTS, mitochondrial DNA in the nuclear genome) some of them are incorporated by the mechanism, when there is a break in the DNA some fragments can be incorporated.
There are only two laws of Mendel.
Most of them (the 4 mechanisms) produce what is called dead on arrival. When the new fragment it’s included in a new position this new fragment is dead. It doesn’t have function but in some instance it can have a huge impact in the genome, in the organism. That fragment doesn’t have any function pseudogenization.
3 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Let’s see what is of all of these mechanisms which are the most important for generating duplications or duplicated genes? We have three main mechanisms (the fourth is not taking in count).
C. elegans is a worm that is a model. They studied the young duplicated genes. What is a young duplicated gene? They are genes that have less than 10% of divergence in silent sites. Silent site is a kind of mutation that doesn’t produce an amino acid change (there is no selection here or a weak selection pressure).
The genes that only diverge less than a 10% in particular sites are genes that have been duplicated recently in time. They only looked at these genes. They have lees of 10% of divergence. The average length of duplication is 1.4 Kb but most of them, about the 70%, have 2.5 Kb. But a gene in these species has an average of 2.5 Kb lengths.
In this study they found that the length of duplication is shorter than the length of the gene. They classified the duplications in three kinds: - Complete duplications the whole gene is duplicated (50% of duplications) Partial duplications only a section of a gene is duplicated (20%) Chimeric duplications a combination of two duplication events in the same fragment. Two different genes that are duplicated and put together. (30%) The other 50% of duplications are partial or chimeric, non-complete duplications.
They have these three kinds of duplications and what is the position of these duplications? They can be in tandem or they can be interspaced.
90% of the mutations are placed in tandem. The main mechanism of generate tandem duplications is a recombination error in the meiosis. But of that 90% that are placed in tandem, 70% were placed in tandem inverted. Inverted tandem repeat. All the tandem repeats are supposed to be produced by a recombination error and they observed that they are in an inverted tandem repeat and the mechanism that generates inverted tandem repeat is an error in the replication.
4 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA At least for C. elegans the mechanism of producing duplications is a replication error.
THE FATE OF DUPLICATED GENES When a gene or a section of a gene is duplicated the most probable is a dead on arrival, no function. But some of them have a function: - Conservation of gene function rRNA, AMY1. We have a duplicated gene but the same function rRNA genes. M. genitalium 2 rRNA genes and X. laevis >500 rRNA genes Same gene same function. Concerted evolution by gene conversion. We have duplicated genes but the different copies have the same function. One of this conservation of genes function is clearly demonstrated in rRNA genes.
For instance, Mycoplasma genitalium. It has one of the smallest genome reported. Has a very short genome. That’s a bacteria (eubacteria) it has only 2 ribosomal genes but in a frog (another model for studying genomics) has more than 5 hundred copies of exactly the same gene with the same function. In more complex organisms we have more copies of that particular gene. We can produce a large amount of this kind of RNA.
Complex organisms Another example of duplicated genes with the conservation of gene function is the human gene of the starch metabolism which is AMY1. We have several copies of this gene in our genome. The human population can be divided in two classes: High starch populations (they eat a lot of starch) 70% of the people have 6 or more copies of the gene Low starch populations (they eat a little bit starch) 37% of the people have 6 or more copies of that gene. (extremely low percentage of people have 6 or more copies of that gene) 5 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA In Africa the diet is different and high starch populations are closed to low starch populations. In Spain the vast majority of the population is high starch population.
The difference of the number of copies is related to the diet, to the consumption of starch. So we have a same gene having the same function but different copies of that gene.
Grey represents high starch populations and red represents low starch populations.
The highest proportion of low starch population has 4 copies of that gene. But in the high starch we have 6, 7 or more copies of that gene. The mean number of copies of that gene is 6.7 for high starch population and the average number of copies of that particular gene is 5.44. We have more efficient transcription and we produce more protein with more copies. At the end, we have better metabolism of the starch.
We have duplication but the gene is always doing the same function.
- Pseudogeneization/nonfunctionalization olfactory receptor gene family Neofunctionalization ECP and EDN Subfunctionalization hoxb1b, hoxb1a These 3 last mechanisms pertain to the DDC model, duplication, degeneration and complementation model.
Imagine that we have a gene (any kind of gene). A protein coding gene (for example) and at the left we have the regulatory sequences and at the right we have the coding sequences for amino acids. This gene is having an ancestral function which is very important for that organism. The ancestral function has to be always present in that organism, if not this organism will die. Then we have a complete duplication of the 6 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA gene (both regulatory sequences and coding sequences). So because we have the duplicated gene we still have that ancestral function. Then we have several scenarios.
1. Pseudogeneization Some of the sequences here, the regulatory or even the coding sequences can mutate can change the sequence. The ancestral function is essential. Any of these two copies is heavily mutated, has a lot of mutations. It mutates in both sequences and this mutated duplication is non-functional. We have a pseudogene and this is a process of pseudogeneization.
2. Neofunctionalization We have several mutations and every mutation produces different functions. And we need a complementation because the ancestral function should be always maintained. The complementation of this two duplicated genes have the ancestral function. The regulatory sequences should be always present.
The complementation of the two copies has the regulatory sequences but they are distributed in two different genes. At the end, one of these genes is having a new function because it has a mutation in the regulatory sequences. We must maintain the ancestral function and this is maintained by a complementation, they are considered two different genes because one copy has a mutation in the regulation sequence.
3. Subfunctionalization We have one copy of the gene that has lost some of the functions. That copy is subfunctional. Then we have another copy that has lost all the functions. That particular copy is subfunctional. Because we have to maintain the ancestor functional the complementation of the copies is needed by a fully functional organism. They are able to replicate the functions. There’s no new function.
Exposome BIOMARKERS - Metabolomics Epigenomics Transcriptomics Endocrine disruptors chemicals that interrupt the normal endocrine hormone functions. Increasing the dose the toxic effects increase. They affect more pathways than estrogens and androgens.
Sources how we get exposed to these chemicals Sensitive or critical periods we get affected more by these chemicals OMICS Era DNA is not the only one that regulates biological processes and at high level we study metabolome, proteome… Experimental pipeline target vs non target Target we need a little bit of knowledge, we want to observe a specific thing and we need a prediction. For example PCR. We choose the genes that we want to analyze.
7 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Non-target it’s a global experiment, we don’t know what’s inside and we need to annotate it when we discover it. It helps us to discover new biomarkers. electrophoresis of proteins.
We have so much data from only one event.
Single nucleotide polymorphism. From 1 individual (patient 1) we have done different analysis but sometimes we don’t have more than 1 sample to do different analysis.
Complex concatenation based integration. We analyze all this data with different phenotypes. We have the matrixes really different and we need at least the 3 elements together b) transformation-based integration the elements are split, we analyze it individually.
You do prediction of that.
c) model based integration predicted as different phenotypes and then we put together Metabolomic, transcriptomic, and epigenetic effects of BPA exposure… Bisphenol A very well known endocrine disruptor Biomarkers that the exposure affect different pathways but allow them to survive. With just a glimpse you can’t see differences but molecularly you can see differences.
HITMAP How the genes are different expressed. Blue = low expression and red = high expression. Downregulated in the control and highregulated in the treatment side and the inverse.
GO term analysis specific functions, stadistical method that says what pathways are altered. Some genes are highregulated and some others downregulated. White = embryonic forms decrease the expression on time and adult forms (black) increase the expression on time.
Genes that have to be expressed at adult time are being expressed at larvae time because they are exposed to BPA.
RNAseq allow you to analyse more genes than microarrays.
Genes upregulated (orange) but metabolites less expressed (blue balls).
CG positions are the responsible of metilation. High levels the metilation low expression of the gene.
8 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA The fate of duplicated genes - DDC Model Nonfunctionalization one copy can be the ancestor function or the new copy can have duplication (mutation) and no function. (Deleterious mutation) Neofunctionalization one of the copies has a new function. The new function can be the same function in another place or in another type of development.
Subfunctionalization the two copies lost partially some function but the complementation of the two copies has the ancestor function The ancestral function is always maintained. The function that is doing the original gene it’s always present in that organism because we have a gene here or we have the complementation of functions of two different genes.
NEOFUNCTIONALIZATION A gene called EDN: Eosinophil derived neurotoxin. Bacterial gene in humans. The duplication was generated 31 MY ago and one of the copies mutated to a new function.
It was transformed in ECP: Eosinophil cationic proteic. Neofunctionalization. We found that OW monkeys and Hominoids the distributed.
One of the copies acquired a new function (similar but different function) and was distributed in these two lineages.
These two genes are paralogous (the same gene in the same genome) OW monkeys.
They are located in different genomes or different species (orthologous).
SUBFUNCTIONALIZATION – DDC Model and HOX genes (Body plan).
HOX GENES (Homeobox genes) they are genes that are expressed in the development of all the animals (only animals). That’s the drosophila melanogaster. A 9 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA mutation in a particular hox gene can give this particular phenotypes. For instance, a fly with 4 wings mutation on the Ubx gene (Ultrabithorax).
Development of legs where they are supposed to have antennas. Antennas transformed into legs. Antp Mutation (Antennapedia).
There are 8 HOX genes in fly. The first gene is expressed in the more frontal part of the body. Then, the following segment has a second gene… and the last gene is expressed in the tail of the organism. The location of the genes is extremely important.
If the location is changed, the mutation are produced.
In animals, mammals, we have a single copy of this particular gene. Only one copy of hox genes. But zebrafish (teleost) have 2 copies of hox genes (all hox). We are talking about one hox but he’s referring to all hox genes.
Hox b gene has two regulatory sequences (circles) in mouse and the coding sequences (rectangular). The regulatory sequences have different functions. One of 10 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA the regulatory sequence are autoregulatory sequence and the other is retinoic-acid response element (hormone).
In zebrafish, nowadays one of the copies has lost the autoregulatory-sequence and the other copy has lost the RARE element. So we are having a complementary degenerative mutations. Subfunctionalization. It lacks one of the elements and that’s why it’s subfunctionalization. The other copy is also subfunctional because it lacks some of the functions but the complementation of the two copies reproduce the ancestral function.
The gene expression of the Hoxb1b gene is in the early time of development of zebrafish and the other copy Hoxb1a has a late time of expression of the gene. These two expression patterns and the combination of these patterns results in the gene expression similar to Mouse hoxb1 (ancestral gene).
The second gene only starts after 12 or 16 hours. The color is the expression. The another one is waiting to be expressed. There is some simultaneous expression and when early stops the late starts to be expressed. We have a complementary degenerative mutation or subfunctionalization.
VIDEO – GENE DUPLICATION Where do genes come from? – Carl Zimmer We have about 20,000 genes in our DNA. They encode the molecules that make up your body, from the keratin in your toenails to the collagen at the tip of your nose, to the dopamine surging around inside your brain. Other species have genes of their own. A spider has genes for spider silk. An oak tree has genes for chlorophyll, which turns sunlight into wood. Where did all those genes come from? It depends on the gene.
Scientist suspect that life started on Earth about 4 billion years ago. The early life forms were primitive microbes with a basic set of genes for the basic tasks required to stay 11 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA alive. They passed down those basic genes to their offspring through billions of generations.
Some of them still do the same jobs in our cells today, like copying DNA. But none of those microbes had genes for spider silk or dopamine. There are a lot more genes on Earth today than there were back then. It turns out that a lot of these extra genes were born from mistakes. Each time a cell divides it makes new copies of its DNA.
Sometimes it accidentaly copies the same stretch of DNA. In process, it may make an extra copy of one of its genes.
At first, the extra gene works the same as the original one. But over the generations it may make up new mutations. Those mutations may change how the gene works, and that new gene may duplicate again.
A surprising number of our mutated genes emerged more recently; many in just the past few million years. The youngest evolved after our own species broke off from the apes. While it may take over a million years for a single gene to give rise to a whole family of genes, scientists are finding that once the new genes evolve, they can quickly take on essential functions.
For example, we have hundreds of genes for the proteins in our noses that grab odor molecules. The mutations let them grab different molecules, giving us the power to perceive trillions of different smells.
Sometimes mutations have a bigger effect on new copies of genes. They may cause a gene to make its protein in a different organ or at a different time of life or the protein may start doing a different job altogether.
In snakes, for example, there’s a gene that makes a protein for killing bacteria. Long ago, the gene duplicated and the new copy mutated. That mutation changed the signal in the gene about where it should make its protein. Instead of becoming active in the snake’s pancreas, it started making this bacteria-killing protein in the snake’s mouth.
So when the snake bit its prey, this enzyme got into the animal’s wound. And when this protein proved to have a harmful effect, and helped the snake catch more prey, it became favored. So now what was a gene in the pancreas makes venom in the mouth that kills the snake’s prey.
Another way to make a new gene is that the DNA of animals and plants and another species contain huge stretches without any protein coding genes. As far as scientists can tell, it’s mostly random sequences of genetic gibberish that serve no function.
These stretches of DNA sometimes mutate, just like genes do. Sometimes those mutations turn the DNA into a place where a cell can start reading it. Suddenly the cell is making a new protein. At first, the protein may be useless, or even harmful, but more mutations can change the shape of the protein and the protein may start doing something useful.
Scientists have found these new genes at work in many parts of animal bodies. So our 20,000 genes have many origins, from the origin of life to new genes still coming into existence from scratch. As long as life is here on Earth, it will be making new genes.