Project description:Macaque species share over 93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g.,HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. To close this gap and enhance functional genomics approaches, we employed a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome-level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells (iPSCs) derived from the same animal. Reconstruction of the evolutionary tree using whole genome annotation and orthologous comparisons among three macaque species, human and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.
Project description:<p><strong>BACKGROUND:</strong> Plants exhibit wide chemical diversity due to the production of specialized metabolites that function as pollinator attractants, defensive compounds, and signaling molecules. Lamiaceae (mints) are known for their chemodiversity and have been cultivated for use as culinary herbs, as well as sources of insect repellents, health-promoting compounds, and fragrance.</p><p><strong>FINDINGS:</strong> We report the chromosome-scale genome assembly of Callicarpa americana L. (American beautyberry), a species within the early-diverging Callicarpoideae clade of Lamiaceae, known for its metallic purple fruits and use as an insect repellent due to its production of terpenoids. Using long-read sequencing and Hi-C scaffolding, we generated a 506.1-Mb assembly spanning 17 pseudomolecules with N50 contig and N50 scaffold sizes of 7.5 and 29.0 Mb, respectively. In all, 32,164 genes were annotated, including 53 candidate terpene synthases and 47 putative clusters of specialized metabolite biosynthetic pathways. Our analyses revealed 3 putative whole-genome duplication events, which, together with local tandem duplications, contributed to gene family expansion of terpene synthases. Kolavenyl diphosphate is a gateway to many of the bioactive terpenoids in C. americana; experimental validation confirmed that CamTPS2 encodes kolavenyl diphosphate synthase. Syntenic analyses with Tectona grandis L. f. (teak), a member of the Tectonoideae clade of Lamiaceae known for exceptionally strong wood resistant to insects, revealed 963 collinear blocks and 21,297 C. americana syntelogs.</p><p><strong>CONCLUSIONS:</strong> Access to the C. americana genome provides a road map for rapid discovery of genes encoding plant-derived agrichemicals and a key resource for understanding the evolution of chemical diversity in Lamiaceae.</p>
Project description:Porcine 60K BeadChip genotyping arrays (Illumina) are increasingly being applied in pig genomics to validate SNPs identified by re-sequencing or assembly-versus-assembly method. Here we report that more than 98% SNPs identified from the porcine 60K BeadChip genotyping array (Illumina) were consistent with the SNPs identified from the assembly-based method. This result demonstrates that whole-genome de novo assembly is a reliable approach to deriving accurate maps of SNPs.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:We sequenced and analyzed the genome of a highly inbred miniature Chinese pig strain, the Banna Minipig Inbred Line (BMI). we conducted whole genome screening using next generation sequencing (NGS) technology and performed SNP calling using Sus Scrofa genome assembly Sscrofa11.1.