Project description:Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. A near universal feature of centromeres is the presence of repetitive sequences, such as satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. Previously, we identified a novel composite retrotransposon, LAVA, that is exclusive to gibbons and expanded within the centromere regions of one gibbon genus, Hoolock. In this study, we use ChIP-seq, RNA-seq and fluorescence in situ hybridization to comprehensively investigate the repeat content of centromeres of the four extant gibbon genera (Hoolock, Hylobates, Nomascus and Siamang). We find that CENP-A nucleosomes and the DNA-protein interface with the inner kinetochore are enriched in retroelements in all gibbon genera, rather than satellite DNA. We find that LAVA in Hoolock is enriched in the centromeres of most chromosomes and shows centromere- and species-specific sequence and structural differences compared to other genera, potentially as a result of its co-option to a centromeric function. In contrast, we found that a centromeric retroelement-derived macrosatellite, SST1, corresponds with chromosome breakpoint reuse across gibbons and shows high sequence conservation across genera. Finally, using de novo assembly of centromere-specific sequences, we determine that transcripts originating from gibbon centromeres recapitulate species-specific TE diversity. Combined, our data reveals dynamic, species-specific shifts in repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity observed within this lineage.
Project description:Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. A near universal feature of centromeres is the presence of repetitive sequences, such as satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. Previously, we identified a novel composite retrotransposon, LAVA, that is exclusive to gibbons and expanded within the centromere regions of one gibbon genus, Hoolock. In this study, we use ChIP-seq, RNA-seq and fluorescence in situ hybridization to comprehensively investigate the repeat content of centromeres of the four extant gibbon genera (Hoolock, Hylobates, Nomascus and Siamang). We find that CENP-A nucleosomes and the DNA-protein interface with the inner kinetochore are enriched in retroelements in all gibbon genera, rather than satellite DNA. We find that LAVA in Hoolock is enriched in the centromeres of most chromosomes and shows centromere- and species-specific sequence and structural differences compared to other genera, potentially as a result of its co-option to a centromeric function. In contrast, we found that a centromeric retroelement-derived macrosatellite, SST1, corresponds with chromosome breakpoint reuse across gibbons and shows high sequence conservation across genera. Finally, using de novo assembly of centromere-specific sequences, we determine that transcripts originating from gibbon centromeres recapitulate species-specific TE diversity. Combined, our data reveals dynamic, species-specific shifts in repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity observed within this lineage.
Project description:The relationship between evolutionary genome remodeling and the three-dimensional structure of the genome remain largely unexplored. Here we use the heavily rearranged gibbon genome to examine how evolutionary chromosomal rearrangements impact genome-wide chromatin interactions, topologically associating domains (TADs), and their epigenetic landscape. We use high-resolution maps of gibbon-human breaks of synteny (BOS), apply Hi-C in gibbon, measure an array of epigenetic features, and perform cross-species comparisons. We find that gibbon rearrangements occur at TAD boundaries, independent of the parameters used to identify TADs. This overlap is supported by a remarkable genetic and epigenetic similarity between BOS and TAD boundaries, namely presence of CpG islands and SINE elements, and enrichment in CTCF and H3K4me3 binding. Cross-species comparisons reveal that regions orthologous to BOS also correspond with boundaries of large (400-600kb) TADs in human and other mammalian species. The co-localization of rearrangement breakpoints and TAD boundaries may be due to higher chromatin fragility at these locations and/or increased selective pressure against rearrangements that disrupt TAD integrity. We also examine the small portion of BOS that did not overlap with TAD boundaries and gave rise to novel TADs in the gibbon genome. We postulate that these new TADs generally lack deleterious consequences. Lastly, we show that limited epigenetic homogenization occurs across breakpoints, irrespective of their time of occurrence in the gibbon lineage. Overall, our findings demonstrate remarkable conservation of chromatin interactions and epigenetic landscape in gibbons, in spite of extensive genomic shuffling.
Project description:The relationship between evolutionary genome remodeling and the three-dimensional structure of the genome remain largely unexplored. Here we use the heavily rearranged gibbon genome to examine how evolutionary chromosomal rearrangements impact genome-wide chromatin interactions, topologically associating domains (TADs), and their epigenetic landscape. We use high-resolution maps of gibbon-human breaks of synteny (BOS), apply Hi-C in gibbon, measure an array of epigenetic features, and perform cross-species comparisons. We find that gibbon rearrangements occur at TAD boundaries, independent of the parameters used to identify TADs. This overlap is supported by a remarkable genetic and epigenetic similarity between BOS and TAD boundaries, namely presence of CpG islands and SINE elements, and enrichment in CTCF and H3K4me3 binding. Cross-species comparisons reveal that regions orthologous to BOS also correspond with boundaries of large (400-600kb) TADs in human and other mammalian species. The co-localization of rearrangement breakpoints and TAD boundaries may be due to higher chromatin fragility at these locations and/or increased selective pressure against rearrangements that disrupt TAD integrity. We also examine the small portion of BOS that did not overlap with TAD boundaries and gave rise to novel TADs in the gibbon genome. We postulate that these new TADs generally lack deleterious consequences. Lastly, we show that limited epigenetic homogenization occurs across breakpoints, irrespective of their time of occurrence in the gibbon lineage. Overall, our findings demonstrate remarkable conservation of chromatin interactions and epigenetic landscape in gibbons, in spite of extensive genomic shuffling.
Project description:Macaque species share over 93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g.,HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. To close this gap and enhance functional genomics approaches, we employed a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome-level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells (iPSCs) derived from the same animal. Reconstruction of the evolutionary tree using whole genome annotation and orthologous comparisons among three macaque species, human and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.
Project description:Chromosome rearrangements in small apes are up to 20 times more frequent than in most mammals. Because of their complexity, the full extent of chromosome evolution in these hominoids is not yet fully documented. However, previous work with array painting, BAC-FISH and selective sequencing in two of the four karyomorphs, has shown that high resolution methods can precisely define chromosome breakpoints and map the complex flow of evolutionary chromosome rearrangements. Here we use these tools to precisely define the rearrangements that have occurred in the remaining two karyomorphs, genera Symphalangus (2n=50), and Hoolock (2n=38). This research provides the most comprehensive insight into the evolutionary origins of chromosome rearrangements involved in transforming small apes genome. Bioinformatics analyses of the human-gibbon synteny breakpoints revealed association with transposable elements and segmental duplications providing some insight into the mechanisms that might have promoted rearrangements in small apes. In the near future, the comparison of gibbon genome sequences will provide novel insights to test hypotheses concerning the mechanisms of chromosome evolution. The precise definition of synteny block boundaries and orientation, chromosomal fusions, and centromere repositioning event presented here will facilitate genome sequence assembly for these close relatives of humans.
Project description:Duckweeds are a monophyletic group of rapidly reproducing aquatic monocots in the Lemnaceae family. Spirodela polyrhiza, the Greater Duckweed, has the largest body plan yet the smallest genome size in the family (1C = 150 Mb). Given their clonal, exponentially fast reproduction, a key question is whether genome structure is conserved across the species in the absence of meiotic recombination. We generated a highly contiguous, chromosome-scale assembly of Spirodela polyrhiza line Sp7498 using Oxford Nanopore plus Hi-C scaffolding (Sp7498_HiC) that is highly syntenic with a related line (Sp9509). Both the Sp7498_HiC and Sp9509 genome assemblies reveal large chromosomal misorientations in a recent PacBio assembly of Sp7498, highlighting the necessity of orthogonal long-range scaffolding techniques like Hi-C and BioNano optical mapping. Proteome analysis of Sp7498 verified the expression of nearly 2,250 proteins and revealed a high level of proteins involved in photosynthesis and carbohydrate metabolism among other functions. In addition, a strong increase in chloroplast proteins was observed that correlated to chloroplast density. This Sp7498_HiC genome was generated cheaply and quickly with a single Oxford Nanopore MinION flow cell and one Hi-C library in a classroom setting. Combining these data with a mass spectrometry-generated proteome, demonstrates that duckweed is a model for genomics- and proteomics-based education.
Project description:The relationship between evolutionary genome remodeling and the three-dimensional structure of the genome remain largely unexplored. Here we use the heavily rearranged gibbon genome to examine how evolutionary chromosomal rearrangements impact genome-wide chromatin interactions, topologically associating domains (TADs), and their epigenetic landscape. We use high-resolution maps of gibbon-human breaks of synteny (BOS), apply Hi-C in gibbon, measure an array of epigenetic features, and perform cross-species comparisons. We find that gibbon rearrangements occur at TAD boundaries, independent of the parameters used to identify TADs. This overlap is supported by a remarkable genetic and epigenetic similarity between BOS and TAD boundaries, namely presence of CpG islands and SINE elements, and enrichment in CTCF and H3K4me3 binding. Cross-species comparisons reveal that regions orthologous to BOS also correspond with boundaries of large (400-600kb) TADs in human and other mammalian species. The co-localization of rearrangement breakpoints and TAD boundaries may be due to higher chromatin fragility at these locations and/or increased selective pressure against rearrangements that disrupt TAD integrity. We also examine the small portion of BOS that did not overlap with TAD boundaries and gave rise to novel TADs in the gibbon genome. We postulate that these new TADs generally lack deleterious consequences. Lastly, we show that limited epigenetic homogenization occurs across breakpoints, irrespective of their time of occurrence in the gibbon lineage. Overall, our findings demonstrate remarkable conservation of chromatin interactions and epigenetic landscape in gibbons, in spite of extensive genomic shuffling.
Project description:<p><strong>BACKGROUND:</strong> Plants exhibit wide chemical diversity due to the production of specialized metabolites that function as pollinator attractants, defensive compounds, and signaling molecules. Lamiaceae (mints) are known for their chemodiversity and have been cultivated for use as culinary herbs, as well as sources of insect repellents, health-promoting compounds, and fragrance.</p><p><strong>FINDINGS:</strong> We report the chromosome-scale genome assembly of Callicarpa americana L. (American beautyberry), a species within the early-diverging Callicarpoideae clade of Lamiaceae, known for its metallic purple fruits and use as an insect repellent due to its production of terpenoids. Using long-read sequencing and Hi-C scaffolding, we generated a 506.1-Mb assembly spanning 17 pseudomolecules with N50 contig and N50 scaffold sizes of 7.5 and 29.0 Mb, respectively. In all, 32,164 genes were annotated, including 53 candidate terpene synthases and 47 putative clusters of specialized metabolite biosynthetic pathways. Our analyses revealed 3 putative whole-genome duplication events, which, together with local tandem duplications, contributed to gene family expansion of terpene synthases. Kolavenyl diphosphate is a gateway to many of the bioactive terpenoids in C. americana; experimental validation confirmed that CamTPS2 encodes kolavenyl diphosphate synthase. Syntenic analyses with Tectona grandis L. f. (teak), a member of the Tectonoideae clade of Lamiaceae known for exceptionally strong wood resistant to insects, revealed 963 collinear blocks and 21,297 C. americana syntelogs.</p><p><strong>CONCLUSIONS:</strong> Access to the C. americana genome provides a road map for rapid discovery of genes encoding plant-derived agrichemicals and a key resource for understanding the evolution of chemical diversity in Lamiaceae.</p>
Project description:The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective fashion. Here, we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67X coverage, Sample GSM1551550). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aedes aegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that virtually all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, accurate, and can be applied to many species.