Tema 2 Genòmica (2017)Apunte Inglés
Inclou els apunts del tema 2 amb les il·lustracions pertinents corresponents a l'assignatura de Genòmica. THE STRUCTURE OF EUKARYOTIC GENOME.
Vista previa del texto
Natalia Mingorance García
3r Biologia – UdG
TEMA 2: THE STRUCTURE OF EUKARYOTIC GENOME
The main thing between prokaryotic and eukaryotic is the presence of real
chromosomes (only in Eukarya).
The chromosome is mainly linear and there are a combination with DNA and proteins.
A chromosome is inside the cell and the karyotype is the number of chromosomes and the organization of them in the cell.
The DNA in the eukaryotic cell is extremely packed. We have different kind of fibers or DNA fibers in the cell. 11 nm… Synthetic biology is the construction of new synthetic cells or organisms. It’s based in the minimal genome.
Synthetic biology explained video Putting the engineering back into genetic engineering 1 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA After ten thousand years of genetic manipulation by selective breeding humans finnaly gained direct access to the genetic code, DNA. Since then, we’ve cut and pasted it, photocopied fragments of it in masse, speed read it with sequences, printed out the code letter by letter in the lab, modeled it on computers and measured it with microscopes.
For forty year now we’ve called this work genetic engineering. The trouble is that while there’s been an extraordinary amount of genetic discovery in manipulation, there’s been precious little engineering. Engineers are frustrated by genetics and molecular biology. The experiments are too slow; the complexity is too messy and growing more so all the time. And there’s a frustrating lack of standardized components.
They’d like to do genetic engineering what engineers and done since the stone age: collect, refine and repackage nature. So that it’s easier to make new and reliable things.
Engineer want to treat DNA more like a programming language –instead of one’s and zero’s, ‘A’ ‘T’ ‘G’ and ‘C’. The key to use DNA to write simple Lego like functional components inspired by, but not found, in nature and then run them in a cell instead of the computer. The only difference is this software builds its own hardware.
They call this re-engineered genetic engineering synthetic biology.
Nowadays, rather than cut and paste the DNA sequence out of an organism and into another you can, if you know what you are doing, just type your DNA sequence into a computer, or copy from a database, or even select it from a growing component catalog and then you just order it over the internet.
The DNA sequence may be copied from nature but the DNA itself is made by machine.
It’s synthetic. The raw material for synthesizing DNA is sugar. Twenty five dollars of which will buy you enough to make a copy of every human genome on the planet. The chemical letters are fed to the DNA equivalent of an industrial inkjet printer. In goes your sequence information and out comes DNA at a cost of less than forty cents per base pair and getting cheaper all the time. It’s then freeze-dried and shipped to your door.
Already engineers have assembled an open source catalog of over five thousand standardized components called BioBricks. At an annual worldwide do-it-yourself competition university students build new and more complex BioBricks, string them together and then run them inside a much studied intestinal bacteria: E. coli. Sure they’re toy projects with shoe string budgets but the results are impressive. E. chromi: a sensitivity tuner and color generator is programmed to turn one of five colors when it detects a certain concentration of an environmental toxin. E. Coliroid is a bacterial system which switches on an off in response to red light and acts like a bacterial Polaroid camera. Groups with more time and a lot more money are writing or as they say in computer programming refactoring, whole systems.
Jay Keisling have built and continually refined a new metabolic pathway in yeast by assembling 10 genes from three organisms (E. coli, Saccharomyces cerevisiae and Artemisia annua) in an attempt to produce synthetically the antimalarian drug 2 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Artemisinin and to do it cheaply enough to treat up to two hundred million malaria suffers each year.
Craig Venter has entirely replaced the DNA of one bacterium with the synthetic copy of DNA from another naturally occurring species, and added a few extras. (Mycoplasma mycoides Mycoplasma capricolum). This wasn’t creating life, it was testing just how reprogrammable a bacteria cell can be. An important step, if we want biological factories which can be tasked to many things like vaccine, medicine, food and even fuel.
In the last ten thousand years genetics has taken us from gathering seeds to manipulating DNA. Engineering has taken us from rocks and caves to hand-held computers and skyscrapers. We can only guess that the two working together as synthetic biology may help us to cheap in the future but the possibilities are breathtaking: engineering algae that can eat climate changing carbon dioxide and produce less polluting biofuels. We might do away with both liver and kidney transplants and instead use a vat grown all-purpose biological sieve organ called a kliver.
We could change the nature of construction, architecture, urban planning, forestry and even gardening… “Testing of understanding by building is the shortest path to demonstrating what you know and what you don’t”.
Synthetic biology is testing and expanding our knowledge of cellular function.
Eukaryotic genome One of the main differences is that eukaryotic cells have chromosomes.
DNA in eukaryotic can be highly packed in nucleosomes and we have several fibers with different sizes until we have the chromosome.
Anaphase and metaphase of mitosis and meiosis. The chromosomes are only seen in these phases of cell division.
We have the chromosomes that are a combination of DNA and proteins (the chromatin is the combination of DNA and proteins) and they are extremely condensed. Within a chromosome of eukaryotic cell we can see two kind or two different DNAs: - Euchromatin is lightly packed, is not extremely condensed. We have transcription and the majority of the genes (92%) Heterochromatin is a tightly packed form of DNA. It can be constitutive (DNA is always packed) and facultative (some sequences change between euchromatin and heterochromatin): 3 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Heterochromatin: - Gene regulation Integrity of chromosomes Clonally inherited Epigenetic mark Position-effect variegation / insulator Constitutive heterochromatin: associated to centromere and telomere Facultative heterochromatin: cell differentiation, X-chromosome inactivation, imprinting, Barr Body.
Regulation of gene transcription and it also has a function of integrity of chromosomes; it helps of maintaining the structure of chromosomes. Clonally inherited, a sequence of DNA in the mother cell that it’s in heterochromatin structure, the same sequence is inherited in the mother cell and the daughter’s cell.
The genes included in these sections will be not functional, will be not transcribed.
Transcription without modifying sequences.
This heterochromatin position can change a little bit: We have a gene in this chromosome in a cell division is distributed in two daughter cells. We have the gene in the same position but in some instance in epigenetic maps we can include this adjacent sequence: the heterochromic position can move a little bit.
When it happens, in one daughter cell we can do transcription but in the other (which has the sequence moved or modification of the position or getting bigger, both are possible) and the transcription now it’s not possible this is called position-effect variegation, the movement of heterochromatin.
And in some instance there are some sequences of DNA called insulator that doesn’t allow moving the sequence. Some genes have specific sequences (insulator) that blocks the movement of heterochromatin. So this movement of heterochromatin cannot be done in this kind of DNA sequences.
The heterochromatin movement has different causes: an error (nothing is perfect) and second heterochromatin has different proteins (modification of histones) that have some control about gene expression, there is a real mechanism of gene expression.
Insulator genes are associated to house-keeping genes (essential genes) and they protect the rest of the DNA to be modified. It only happens in cell division.
4 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Heterochromatin has special role in expression regulation but again we have these two kinds of heterochromatin.
Constitutive: they are always associated to centromeres and telomeres. In these chromosomes we have specific regions that are always in heterochromatin. Weak or non-gene expression.
Facultative: the DNA regions that can change between hetero and euchromatin.
Depending on the cells that particular region is euchromatin but in other cell it can be heterochromatin or of the timing, the generations the chromatin can change… Xchromosme inactivation, imprinting. Epigenetics imprinting. that’s mainly in women.
One of the two chromosomes is inactivated.
Sometimes it can be inactivated and it’s independent of his origin (mother or father).
Sometimes we have some X chromosome inactivated. Heterochromatin can be the one that came from the mother or from the father.
Imprinting is that this chromosome has been modified epigenetically, the structure.
This chromosome is imprinted. Histons modification. It happens always. This X chromosomes can be in heterochromatin or in euchromatin. The heritage can activate or deactivate it.
Genome size How much DNA we have in our cells? C-value This is the amount of DNA in our cell and we usually use it in haploid nucleus cells (gamete). This amount of DNA in cells can be expressed in pico grams and megabase (1 pg = 1 Mb) The question here is: do you know the amount of DNA that there is in human cells? 3.5 Gb in our genome.
The amount of DNA in different species, we find something very strange. What we can expect is that if we have to do more complex functions, we need more DNA, to have more information. For instance, lungfish has a genome size over 133 Gb. Even, we have some worms with a genome size larger than mammals and for sure they are less complex than mammals. So, there is no clear relationship between genome size and the complexity of the organism. This lack of correspondence between complexity and genome size is called C-vale paradox. Genome size is NOT an indicator of the genomic or biological complexity of an organism.
5 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Genome size and gene number (graph): Protein-encoding genes and the genome size. In blue squares we have viruses, in round red we have the prokaryotic wall and green triangles we have eukaryotic organisms. In bacteria, even in archaea we have a clear relationship between genome size and gene number. Eukarya have a lack of correspondence number of proteincoding genes and organism complexity something strange.
How many genes we have in our genome? 20.000 – 25.000 For instance, these parasitic organisms (Trichomonas vaginalis) have the largest number of protein coding genes (60.000). So, we have a small genome but an also very few protein coding genes.
GENE ANNOTATION: studying the function of the genomes. We have the sequence and the location. When you annotate a genome you know all the genes inside the genome and all the different items that characterize the genome. This is gene annotation. So in a new annotated human genome/in the last annotation of the human genome the protein coding genes are only less than 19.000. So this number here: 20.000 it’s outdated. In our genome we only have less than 19.000 protein coding genes. So, the message here is that we have a relatively small genome but we also have very few protein coding genes and instead of that we are relatively complex organisms. So, something strange is happening. (Transposable elements 45%) 6 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA C-value paradox and gene number Birds have extremely small genomes and it’s probably related to flight adaptation.
There is no correlation of the protein coding genes and the C-value.
The last annotation of human genome shows that we have less than 20.000 protein coding genes, which is a small amount.
That’s a typical chart of human genome. The components in human genome The protein coding-genes are only around 1.5% of the total genome.
The vast majority is repetitive elements and they are about the 50% of the genome.
Introns are about ¼ of the genome. Using another classification transposons LTR, SNEs, LINEs (jumping genes) some of them are repetitive and some of them no, and it’s about a 45% of the genome.
Only the 1.5% of out genome are protein coding genes. Transposons have some kind of genes inside.
Continuing with the analysis of the c-value paradox: pseudogenes. The definition of pseudogenes is: dysfunctional relatives of genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell (dysfunctional or functional genes). They are usually duplicated genes; they lost their protein coding genes ability.
So pseudogenes are something that in the past were genes and because a mutation they lost their function. They are now pseudogenes: classical pseudogenes, processed pseudogenes or Numts.
Numts are fragments of a mitochondrial DNA that are incorporated in the nuclear genome. They are usually non-functional. We have the same amount of pseudogenes that protein coding genes (19.000).
7 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Analyzing again the C-value, in this case is the size of the introns against the C-value.
At that moment, larger genomes have larger introns. That is a positive correlation between genome size and intron size. In our genome we have a large amount of pseudogenes and also introns are larger.
C-value paradox and gene number: the less complex organisms have less number of genes and also the number of introns is also inferior. This is low and one of the most complex organisms (which are humans) have 19.000 protein coding genes but we also have a lot of introns per gene. In average, we have about ten introns per gene.
Obviously, if we have larger number of introns, we have also an increase in the number of exons. The number of exons clearly increases in complex organisms. That’s clearly a consequence of the number of introns. That’s a clear correlation. And also, the size of the genes is larger in complex organisms and that’s also a clear correlation. The size of the gene is including exon + intron. Obviously the size of the gene, including both is larger in mammals. If we have more exons and more introns, we will have bigger genes.
8 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Genome size. Complexity. They are not correlated.
Genome size. Number of protein coding genes. They are not correlated. Underlined in yellow we have the C-value paradox The genome size and the intron size are correlated.
In humans we have approximately 25% of our genome composed by introns.
Human genome size is about 3.5 GB and the 50% of our genome is non-coding.
Larger genes (and genes here are exons + introns) in the purple chart one of the solutions of having a genome size that doesn’t explain the complexity is that in the human we have a lot of introns in the genes, alternative splicing, and the splicing. It can explain that having very few genes can produce extremely complex organisms. One of the reasons is that we have a lot of introns per gene and it’s a lot of different splicing transcription, we have different combinations of genes.
One of the most extreme examples is Drosophila Dscam gene. It can produce a single genome about 60.000 bp.
9 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA The different combination of these exons can produce a huge amount of different proteins (the origin is a single gene). Alternative splicing.
The point here is that complex organisms have more introns and we can solve the problem with complexity alternative splicing.
Pink chart the genes are organized in families. Gene family all the genes that are related on homology are of the same family. Usually the genes on the same family do similar functions. So we have different families of genes. More or less, the number of different families of the genes is maintained. It’s the same. The number of gene families from worms to mammals is more or less the same (here we have a small increment of the number of families). In contrast, the number of different genes in each family increases. That means that we have more diversity of genes in complex organisms. Different functions of genes.
Genome size and coding DNA: we have different kind of genes. The amount of coding DNA (blue) and the non-coding elements (yellow). The amount of coding, the percentage of coding DNA is reduced with complex organisms is up to 1.5% of the human genome. Obviously, if the coding is reduced, the non-coding clearly increase.
Inside this non-coding DNA we have some of the regulatory sequences. The increase of non-coding DNA is related the number of regulatory elements (yellow pieces). The expansion of the non-coding and repeated DNA (intronic, intergenic, interspersed repeat sequences) increases epigenetic marks and increase in size and complexity of transcription units.
10 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA Also in the non-coding DNA is where is located the epigenetic maps. So, more complex organisms have more complex gene expression regulation.
So we can almost solve the c-value paradox. Complexity cannot be explained by genome size but it can be explained by splicing (introns), different number of genes and complex regulation. That means that with a few genes only 19.000 we can produce an extremely complex organism because of these 3 mechanisms.
The regulation elements are included in the non-coding and this is clearly larger in these kinds of genomes. Genome size is related to complex regulation.
Protein-coding genes and transposable elements (jumping genes). They are always the same. As the genome size increases, the percentage of protein coding genes is reduced. Transposons increase. In the human genome we have about 45% of the genome being transposons elements.
The percentage of repetitive elements and the estimated number of families in the genome of different organisms. We have different kind of transposons in our genome (LINE/SINE, LTR, DNA transposons…).
11 Natalia Mingorance García 3r Biologia – UdG UNYBOOK: nattymg23 GENÒMICA We have transposons: 45% of our genome is transposons. There are two kinds of transposons: o o CLASS I: RNA transposons (virus) LINE/SINE (33.4%) LTR (81.1) We are transgenic elements because in our genome we have virial genes.
- CLASS II: DNA transposons (2.8) Again, in this chart we have DNA transposon and the number can fluctuate. 3% is even larger than the number of protein coding genes. 25% of our genome has virial origin.
To sum up, the complexity of an organism is not related to the DNA sequences and the number of coding sequences is related to the function of the genome, maybe splicing (pseudogenes). So, the C-value paradox can be explained by the presence of repetitive elements in the genome and the complex expression of the genome (maybe related to the splicing and the regulation of genes expression, also the different number of genes).
Organismal complexity is the result of much more than the sheer number of nucleotides that compose a genome and the number of coding sequences in that genome. Splicing, (human >95%), transplicing, tandem chimerism, duplication, pseudogenes… C-value paradox and gene number is mainly explained by: unequal distribution of repetitive elements in the Eukaryotic genome and the expression of the genome (splicing and others).