Tema 3 Genòmica (2017)Apunte Inglés
Inclou els apunts del tema 3 amb les il·lustracions pertinents corresponents a l'assignatura de Genòmica. Genes.
Vista previa del texto
Natalia Mingorance García
3r Biologia – UdG
TEMA 3: GENES
The definition of a gene: a DNA segment that contributes to phenotype / function. In the
absence of demonstrated function a gene may be characterized by sequence,
transcription of homology.
In a gene we usually have a promoter in a 5’ position CIS. In a close position with a gene, after it, we have a UTR region (is a region that is present in mRNA but it’s not present in a protein (UNTRANSLATED REGION). It appears in the first and the last section of the transcribed DNA.
The starting codon is methionine (AUG) in protein coding genes. We have start and stop codons.
That’s the classical diagram of a protein coding gene.
- Promoter 5’ and 3’ untranslated region (UTR): polymorphic. Affect the stability and translatability of mRNA.
Exons: coding regions (from the start codon, ATG-to stop codon).
Introns: sequences that disrupt exons, which may have regulatory functions of the expression.
Exon is any nucleotide sequence encoded by a gene that remains present within the final mature RNA product of that gene after introns have been removed by RNA splicing.
The problem here is when we have some genes that aren’t protein coding. We have genes that usually codify for proteins and there are others that don’t codify, and it’s difficult to identify it.
We can’t find exons easily in no protein coding genes because these exons doesn’t codify to proteins. Easy definition of an exon: anything that remains after splicing.
When the splicing removes the introns all the RNA sequences are exons.
Introns. We have different kind of introns (concretely, four types of introns): - The first is the ones related to spliceosomes. Spliceosomes are all the proteins, DNA and RNA that are related to the splicing of messenger RNA. The typical splicing mechanism. introns in nuclear protein-coding genes that are removed by spliceosomes 1 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA - o o Archaeal genes, some sharing features between eukaryotic and archaea. Particular proteins responsible of these splicing introns in nuclear and archaeal RNA genes that are removed by proteins (tRNA splicing enzymes).
Self-splicing introns: type I and type II. (three domains of life) Self-splicing Group I introns that are removed by RNA catalysis Self-splicing Group II introns that are removed by RNA catalysis The difference between type 1 and type 2 is that type 1 has as a key nucleotide a Guanine and type 2 has an Adenine. The difference between these 2 splicing introns is that they don’t need proteins. It’s a function of RNA. In the eukaryotic genome we can find different kind of introns, maybe related to splicing.
And then we can also classify the genes based on the transcription of these genes or mainly based in the enzyme that transcribe these genes. In eukaryotic we have the three RNA polymerases. And each of these polymerases transcribes a specific segment or region of the genome.
We have type of genes (gene classification based on the type of RNA polymerase): Class I genes they are the genes that are transcribed by RNA pol 1. Ribosomal RNA, 8S, 18S and 28S (NO 5S RNA) these are not protein coding genes but they account for about 50% of the transcription process in the eukaryotic cells. 50% of the transcription is made by RNA polymerase 1, transcribed by this ribosomal RNA (transcribes to proteins).
2 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA The RNA polymerase 2 is that ones that transcribes mRNA. Includes these 2 modifications of the messenger RNA (unprocessed transcript, 7-methyl guanosine 5’ cap (5’) and poly A tail).
Small nuclear RNA (150 bp) and micro RNA (22 bp). They have different functions: - The small nuclear RNA controls, helps and regulates the splicing of the genes and it includes the spliceosome.
microRNA: regulates the transcription of other genes.
About 60% of the genes in the human genome are targeted of microRNA. In other words, microRNA is known to regulate the transcription of the 60% of the genes.
The last type of genes is the class 3 genes: RNA Polymerase 3, it’s the less important.
Dystrophin gene is the largest gene in our genome. It’s located in X chromosome and it takes about 16 hours to transcribe it. It includes almost 79 exons and 78 introns (400 kb intron). Locus Xp21. 2.2 Mb, 0.07% of the human genome, primary transcript 2400 kb, mature mRNA 14 kb, 3.500 amino acids and 28 unknown isoforms.
Video: exons and introns – Melissa Moore (U Mass/HHMI) EXAM Dystrophin gene’s function is muscular. When this gene is not functional people suffer from this illness called dystrophy.
Eukaryotic genes expression eukaryotic genes are split; they contain sequences that are not can translated into protein sequence (introns), the exons. The gene is first translated in pre-mRNA and it undergoes several steps of processing. First it is capped with a seven methyl guanosine cap and at the 3’ end it is cleaved and then a poly A tail is added. In the middle, these intron sequences are literally spliced out (pre-mRNA splicing). After all the processing is done, the mRNA migrates to the nuclear envelope where is exported and then used to try to be translated into proteins.
The typical human gene has 23 thousand (23.000) bp and 7 introns. A median intron length of over ten times that of the exon lace so that means that whatever a human gene is transcribed, ninety to ninety-five percent of RNA is immediately spliced out and thrown away and that seems rather wasteful.
Dystrophin gene (DMD) encodes a protein that’s necessary for your muscles and mutations in this gene are one of the causes of muscular dystrophy. The DMD gene is the second largest gene in the human genome. It’s 2.2 million base pair long, it has 79 exons, 78 introns and one of those introns is 400.000 nucleotides long (400 kb intron).
The introns are much longer than the exons.
It takes 16 hours for polymerase to transcribe this entire gene of dystrophin. Now once all of those introns are removed to scale this is the size of the mRNA. This mRNA is very long, it is a 17.000 base mRNA but it less than one percent of the original RNA that was transcribed.
Introns are the majority of the gene composition. The exons are really tiny if we compare it to introns.
3 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Pseudogenes: they are extremely related to genes but they are not actual genes.
Pseudogenes are dysfunctional relatives of genes. In the past they were functional genes but because these genes suffered several mutations they are not functional but they still have the typical structure of a gene.
Pseudogenes are dysfunctional relatives of genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell.
We have different kinds of pseudogenes: - Non-processed or classical it maintains the introns or the structure.
A non-processed pseudogene is a copy of a functional gene that may arise as a result of a gene duplication event and subsequently acquire mutations that cause it to become nonfunctional.
One of these pseudogenes is the vitellogenin and it’s related to the egg yolk proteins.
In mammals we have this gene, we have the structure and the sequence of the gene maintained but it’s not functional. In this case, the genome duplication event is a speciation event during the diversification of birds to mammals or to humans. In this line, in mammals this gene has lost his function. We have this gene but with some mutations that made it dysfunctional. They aren’t dysfunctional but they maintain the structure.
- Processed no introns. The main difference is that they don’t have introns, they lost the gene structure.
mRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. NO introns.
The main mechanism that origin pseudogenes is duplication and sometimes there are speciation in non-processed or classical pseudogenes. Here (in processed pseudogenes) the main mechanism is RNA transposons. They have the specific particularity that they have reverse transcriptase to produce DNA by RNA. So, using this mechanism of reverse transcriptase in transposons we can have a gene that is transcribed to RNA and then using this kind of transposons we can return it to DNA without introns. Then, this DNA can be put back to the genome.
4 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA So here the diagram represents the introns and the exons, with all the structure maintained but it has lost his function. The duplication consists in lost the function to make speciation. We still have the gene but it’s not functional.
- Numts Nuclear mitochondrial DNA. Nuclear pseudogenes of mitochondrial origin (755 NUMTs – 38 to 15000 bp). The human genome is estimated to contain about 19,000 pseudogenes. Pseudogenes of mitochondrial DNA.
Non-processed pseudogenes have introns and processed pseudogenes don’t have introns. Usually we know pseudogenes as non-functional genes.
Weird cases: For instance we have one pseudogene that has function. This gene is fgf4 that regulates the growth and development of bones in mammals. It’s a classical gene with functionality and when that happens and then talking in dogs we have long legged dogs. In the chromosome 18 of dogs there is a pseudogene, a processed pseudogene (fgf4 retrogene) that means that they have in their genome DNA and they don’t have introns and also they don’t have all the regulatory sequences. This pseudogene can produce a transcript which is also translated to a functional protein. When that occurs, we have dogs with short legs or “chondroplasty dogs”. The presence of these retrogenes in the genome of certain dogs gives a phenotype with extremely short legs.
The same situation happens in humans (nans) which have in their genome this processed gene. We have a processed pseudogene with function.
5 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Functionality of a pseudogene (another example): we have a normal gene in Chr10 (in humans) called PTEN gene. This gene has a complete structure. The function of this gene is tumor suppressor. There is a transcription of this gene and in the 3’ UTR sequence there are two sequences called S1 and S2. That’s outside of the coding region; they are in the UTR region. The S1 sequence binds micro RNA.
Micro RNA is some kind of extremely short sequences of RNA that can bind in specific regions of mRNA. So this S1 sequence is targeted or has the sequence that one specific mRNA can bind.
When this happens, we have S1 + microRNA and the PTEN gene is not functional.
DOWN REGULATION of the mRNA. And because there’s no function we have cancer, because PTEN gene is a tumor suppressor gene. The transcription is blocked.
There’s no protein.
But in Chr9 we have a processed pseudogene of PTEN gene called PTENP1. This pseudogene has lost the function of producing-proteins, it can’t produce any protein but it can be transcribed, it can produce mRNA. And it still has a S1 sequence and the microRNA can bind to it when this happens.
The microRNA goes to the pseudogene instead of going to the real gene. The real gene can be transcribed now and it can produce a tumor suppressor.
These processes are called DECOY (Esquer) for microRNAs.
The definition that says that pseudogenes don’t have function is not true at all, we have some specific functions. Dysfunctional (?): 6 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA What is a gene? The concept has changed in time. When the results of the ENCODE PROJECT (the ENCyclopedia Of DNA Elements) were released we obtained the last gene definition.
The ENCODE PROJECT is still working but it has worked since 2003 to 2007. The main result was that 80% of the genome it’s transcribed (and they have biochemical function). 1.5% of out genome are protein coding genes and we have 19.000 genes. ENCODE highlighted the number and complexity of the RNA transcripts that the genome produces. Functional elements.
There are two definitions of a gene used before the ENCODE PROJECT (PreENCODE definitions): - Any DNA segment that contributes to the function or phenotype.
Any location of the genome that has function and has some regulatory regions.
“A gene is a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology” (Human Genome Nomenclature Organization (2002)) “A gene is a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions” (Sequence Ontology Consortium (2006)).
What did they found in the ENCODE PROJECT? ENCODE RESULTS: They had some problematic issues with the current definition of a gene: 1. Gene regulation. There is a classical gene regulation. After the encode results they found that there are gene regulation far away or in trans either in other chromosomes not ligated that they have function of regulation of specific genes.
There are some regulation sequences of genes even in other chromosomes, far 7 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA - away. Also that there are gene regulatory sequences in the exons, in the coding regions.
Sequences that could affect practically every aspect of gene regulation.
Sequences within the coding sequence, flanking regions, far away 2. Gene location or Overlapping. After the encode results they found two genes coding different proteins in the same locus of the genome (reading frame). We read the genome in groups of 3 nucleotides and we can change the reading frame.
We have two different reading frames and that’s why we have different functions. It occurs in eukaryotic and prokaryotic organisms.
They have found some genes in the introns. There are two genes in the same locus because one gene is located in the intron of another gene. Inside the intron of gene A they have exons and introns of the gene B. The transcription of the gene produces a protein B and the other produces a protein B. In the same location we have two different genes Genes to overlap one another, sharing the same DNA sequence in a different reading frame or on the opposite strand.
One gene to be completely contained inside another one’s intron.
3. Splicing and Trans-splicing. In the classical mechanism of splicing, one gene has different or multiple functions because we do the alternative splicing. The combination of exons.
o Trans-splicing ligation of two separate mRNA molecules: We have a concept called trans-splicing which is the ligation of two different messengers. We have gene A in chromosome one. And in chromosome two we have a different gene.
Exons and introns. Usually it gives you an mRNA with three exons (gene A) and the gene B has two exons. They can splice together and produce a trans-splicing, a ligation of two different transcripts with different origins. We have a completely new mRNA and protein.
o Tandem chimerism. This transcription occurs when two consecutive genes are transcribed all together and it gives a single mRNA by two different genes.
Two consecutive genes are transcribed into a single RNA. Fused protein o Homotypic trans-splicing. We have a single gene with a single mRNA but this gene also can produce two copies of the same mRNA (at least). These two copies can be ligated and produce a single mRNA with the exons duplicated. Two identical transcripts of a gene generate an mRNA where the same exon sequence is repeated.
4. Parasitic and mobile genes o Transposons 5. The large amount of “junk DNA” under selection and Transcribed (80% of our genome is transcribed including some of the junk DNA or some of the pseudogenes). Pseudogenes Transcribed.
8 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA For instance, intronic genes, we have genes inside an intron of other gene and the issue is that we have two genes in the same locus.
Results from ENCODE – dispersed genome activity: the last ENCODE results included even more difficulties in gene definition.
1. Unannotated transcription Not annotated DNA is transcribed into RNA Only 50% transcription activity are annotated genes TARs (transcriptionally active regions) and transfrags (Transcribed fragment, microRNA) GENE ANNOTATION is the description of a gene. For instance, location, function, protein (if it codifies for a protein)… we have all the possible descriptions for a gene or for a genome. Annotated sequences are sequences of DNA which we have all the information of it. We know a lot of things about it.
What does unannotated transcription mean? Something that we don’t know in our genome and it’s DNA transcribed into RNA. In other words, only 50% of the transcription activity corresponds to annotated genes. The other 50% of transcription is not annotated. We call TARs the annotated genes and transfags the unannotated genes.
2. Unannotated and alternative TSSs (transcription start sites) Unannotated transcription start sites (TSSs) Proteins with unannotated transcription start sites TSSs. Sometimes >100 kb upstream half loci with alternative TSS. Some of the proteins have transcription start sites that are really far away from where the protein is encoded.
9 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA One gene (Gene1) can have, in this case in Chr1, a transcription start site very close or near of the location of the gene (we expect this from a gene) and it’s called CIS REGULATION.
For the same gene in Chr2 we have a transcription start site for the gene 1. In this case, the transcription start site is very far away from the gene and it’s called a TRANS REGULATION. TRANS splicing (we have regions mixed of sequences of Chr1 or Chr2, we have different products of gene transcription). Even if it’s in the same Chr but it’s very far away it’s TRANS regulation.
So, we have transcription in regions that we don’t know and regions or starting the transcription far away where the gene is located.
3. More alternative splicing 5.4 transcripts per locus Almost all genes in human genome are doing alternative splicing in average of 5.4 transcripts per locus.
4. Dispersed regulation Rich and poor regions We have several sequences in the genome that regulates the transcription distributed along the DNA. And these regulation sequences can be located usually in the 10 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA promoter, in CIS regulation but also inside the gene and again regulation can be placed in a long distance from the gene.
5. Noncoding RNAs, Pseudogenes, Selective constrained regions, (UCR or Ultra Constrained Regions) Noncoding RNA can be tRNA, rRNA, microRNA, smallRNA… all of these kind of RNA.
Pseudogenes: with transcription, with function.
Constrained elements: regions that apparently doesn’t have function but they are under selection so they don’t accept mutations. They have some kind of function but we already don’t know.
Ultra-constrained regions (UCR) are regions that are extremely short (about 200 bp) that are identical across the evolution. Identical regions. They have any mutations.
These are not annotated sequences, with no function. There’s extreme selection pressure here.
With all these results they proposed a new definition of a gene: A proposed updated definition 1. A gene is a genomic sequence (DNA or RNA) directly encoding functional product molecules, either RNA or protein.
2. In the case that there are several functional products sharing overlapping regions (or sequences it’s considered a gene), one takes the union of all overlapping genomic sequences coding for them. We can obtain different products because we have alternative splicing: mRNA GENE1, mRNA GENE2, mRNA GENE1 + GENE2.
3. This union must be coherent –i.e., done separately for final protein and RNA products- but does not require that all products necessarily share a common subsequence. If all the products of the transcription don’t have any function is because they are not part of the same gene.
All the products have to be functional separately, not necessarily together.
“A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products”.
Summary – Gene 1.
Classical structure of a gene Basic definition of a gene Exon definition. Introns.
Gene classification by Polymerase Pseudogenes types. Processed, non-processed, Numts Pseudogenes function Gene definition issues. Encode Updated gene definition 11 ...