Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:Peanut (Arachis hypogaea) has a large (~2.7 Gbp) allotetraploid genome with closely related component genomes making its genome very challenging to assemble. Here we report genome sequences of its diploid ancestors (A. duranensis and A. ipaënsis). We show they are similar to the peanutâs A- and B-genomes and use them use them to identify candidate disease resistance genes, create improved tetraploid transcript assemblies, and show genetic exchange between peanutâs component genomes. Based on remarkably high DNA identity and biogeography, we conclude that A. ipaënsis may be a descendant of the very same population that contributed the B-genome to cultivated peanut. Whole Genome Bisulphite Sequencing of the peanut species Arachis duranensis and Arachis ipaensis.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:The Global Pandemic Lineage (GPL) of the amphibian pathogen Batrachochytrium dendrobatidis (Bd) has been described as a main driver of amphibian extinctions on nearly every continent. Near complete genome of three Bd-GPL strains have enabled studies of the pathogen but the genomic features that set Bd-GPL apart from other Bd lineages is not well understood due to a lack of high-quality genome assemblies and annotations from other lineages. We used long-read DNA sequencing to assemble high-quality genomes of three Bd-BRAZIL isolates and one non-pathogen outgroup species Polyrhizophydium stewartii (Ps) strain JEL0888, and compared these to genomes of previously sequenced Bd-GPL strains. The Bd-BRAZIL assemblies range in size between 22.0 and 26.1 Mb and encode 8495-8620 protein-coding genes for each strain. Our pan-genome analysis provided insight into shared and lineage-specific gene content. The core genome of Bd consists of 6278 conserved gene families, with 202 Bd-BRAZIL and 172 Bd-GPL specific gene families. We discovered gene copy number variation in pathogenicity gene families between Bd-BRAZIL and Bd-GPL strains though none were consistently expanded in Bd-GPL or Bd-BRAZIL strains. Comparison within the Batrachochytrium genus and two closely related non-pathogenic saprophytic chytrids identified variation in sequence and protein domain counts. We further test these new Bd-BRAZIL genomes to assess their utility as reference genomes for transcriptome alignment and analysis. Our analysis examines the genomic variation between strains in Bd-BRAZIL and Bd-GPL and offers insights into the application of these genomes as reference genomes for future studies.
Project description:Peanut (Arachis hypogaea) has a large (~2.7 Gbp) allotetraploid genome with closely related component genomes making its genome very challenging to assemble. Here we report genome sequences of its diploid ancestors (A. duranensis and A. ipaënsis). We show they are similar to the peanut’s A- and B-genomes and use them use them to identify candidate disease resistance genes, create improved tetraploid transcript assemblies, and show genetic exchange between peanut’s component genomes. Based on remarkably high DNA identity and biogeography, we conclude that A. ipaënsis may be a descendant of the very same population that contributed the B-genome to cultivated peanut.
Project description:Many crop species have complex genomes, making the conventional pathway to associating molecular markers with trait variation, which includes genome sequencing, both expensive and time-consuming. We used a streamlined approach to rapidly develop a genomics platform for hexaploid wheat based on the inferred order of expressed sequences. This involved assembly of the transcriptomes for the progenitor genomes of bread wheat, the development of a genetic linkage map comprising 9495 mapped transcriptome-based SNP markers, use of this map to rearrange the genome sequence of Brachypodium distachyon into pseudomolecules representative of the genome organization of wheat and sequence similarity-based mapping onto this resource of the transcriptome assemblies. To demonstrate that this approximation of gene order in wheat is appropriate to underpin association genetics analysis, we undertook Associative Transcriptomics for straw biomass traits, identifying associations and even candidate genes for height, weight and width.
Project description:Here we present the first whole-genome assemblies of Arabidopsis thaliana strains since the release of the 125 Mb reference genome sequence a decade ago. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone.
2011-02-11 | GSE24569 | GEO
Project description:Phylogeny of Seirinae based on complete mitochondrial genomes