Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:We developed a reliable CNV detection method for species lacking contiguous reference genome. We selected multiple probes within 14,078 gene sequences and developed comparative genome hybridization on arrays. Gene CNVs were assessed in three full-sib families from species with 20 Gb genomes, i.e. white and black spruce, and interior spruce - a natural hybrid. Results: We discovered hundreds of gene CNVs in each species, 3612 in total, which were enriched in functions related to stress and defense responses and narrow expression profiles, indicating a potential role in adaptation. The number of shared CNVs was in accordance with the degree of relatedness between individuals and species. The genetically mapped subset of these genes showed a wide distribution across the genome, implying numerous structural variations. The hybrid family presented significantly fewer CNVs, suggesting that the admixture of two species within one genome reduces the occurrence of CNVs.
2016-12-14 | GSE92329 | GEO
Project description:complete mitochondrial genomes sequencing of Felidae family species Genome sequencing
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:We first report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain insight into the wide range of transcriptome of Hevea brasiliensis. The output of sequenced data showed that more than 12 million sequence reads with average length of 90nt were generated. Totally 48,768 unigenes (mean size = 488 bp) were assembled through transcriptome de novo assembly, which represent more than 3-fold of all the sequences of Hevea brasiliensis deposited in the GenBank. Assembled sequences were annotated with gene descriptions, gene ontology and clusters of orthologous group terms. Total 37,373 unigenes were successfully annotated and more than 10% of unigenes were aligned to known proteins of Euphorbiaceae. The unigenes contain nearly complete collection of known rubber-synthesis-related genes. Our data provides the most comprehensive sequence resource available for study rubber tree and demonstrates the availability of Illumina sequencing and de novo transcriptome assembly in a species lacking genome information. The transcriptome of latex and leaf in Hevea brasiliensis
Project description:The majority of bacterial genomes have high coding efficiencies, but there are an few genomes of the intracellular bacteria that have low gene density. The genome of the endosymbiont Sodalis glossinidius contains almost 50% pseudogenes containing mutations that putatively silence them at the genomic level. We have applied multiple omic strategies: combining single molecule DNA-sequencing and annotation; stranded RNA-sequencing and proteome analysis to better understand the transcriptional and translational landscape of Sodalis pseudogenes, and potential mechanisms for their control. Between 53% and 74% of the Sodalis transcriptome remains active in cell-free culture. Mean sense transcription from Coding Domain Sequences (CDS) is four-times greater than that from pseudogenes. Core-genome analysis of six Illumina sequenced Sodalis isolates from different host Glossina species shows pseudogenes make up ~40% of the 2,729 genes in the core genome, suggesting are stable and/or Sodalis is a recent introduction across the Glossina genus as a facultative symbiont. These data further shed light on the importance of transcriptional and translational control in deciphering host-microbe interactions, and demonstrate that pseudogenes are more complex than a simple degrading DNA sequence. For this reason, we show that combining genomics, transcriptomics and proteomics represents an important resource for studying prokaryotic genomes with a view to elucidating evolutionary adaptation to novel environmental niches.
Project description:The domestic goat, Capra hircus (2n=60), is one of the most important domestic livestock species in the world. Here we report its high quality reference genome generated by combining Illumina short reads sequencing and a new automated and high throughput whole genome mapping system based on the optical mapping technology which was used to generate extremely long super-scaffolds. The N50 size of contigs, scaffolds, and super-scaffolds for the sequence assembly reported herein are 18.7 kb, 3.06 Mb, and 18.2 Mb, respectively. Almost 95% of the supper-scaffolds are anchored on chromosomes based on conserved syntenic information with cattle. The assembly is strongly supported by the RH map of goat chromosome 1. We annotated 22,175 protein-coding genes, most of which are recovered by RNA-seq data of ten tissues. Rapidly evolving genes and gene families are enriched in metabolism and immune systems, consistent with the fact that the goat is one of the most adaptable and geographically widespread livestock species. Comparative transcriptomic analysis of the primary and secondary follicles of a cashmere goat revealed 51 genes that were significantly differentially expressed between the two types of hair follicles. This study not only provides a high quality reference genome for an important livestock species, but also shows that the new automated optical mapping technology can be used in a de novo assembly of large genomes. Corresponding whole genome sequencing is available in NCBI BioProject PRJNA158393. We have sequenced a 3-year-old female Yunnan black goat and constructed a reference sequence for this breed. In order to improve quality of gene models, RNA samples of ten tissues (Bladder, Brain, Heart, Kidney, Liver, Lung, Lymph, Muscle, Ovarian, Spleen) were extracted from the same goat which was sequenced. To investigate the genic basis underlying the development of cashmere fibers using the goat reference genome assembly and annotated genes, we extracted RNA samples of primary hair follicle and secondary hair follicle from three Inner Mongolia cashmere goats and conducted transcriptome sequencing and DGE analysis. This submission represents RNA-Seq component of study.
Project description:Many crop species have complex genomes, making the conventional pathway to associating molecular markers with trait variation, which includes genome sequencing, both expensive and time-consuming. We used a streamlined approach to rapidly develop a genomics platform for hexaploid wheat based on the inferred order of expressed sequences. This involved assembly of the transcriptomes for the progenitor genomes of bread wheat, the development of a genetic linkage map comprising 9495 mapped transcriptome-based SNP markers, use of this map to rearrange the genome sequence of Brachypodium distachyon into pseudomolecules representative of the genome organization of wheat and sequence similarity-based mapping onto this resource of the transcriptome assemblies. To demonstrate that this approximation of gene order in wheat is appropriate to underpin association genetics analysis, we undertook Associative Transcriptomics for straw biomass traits, identifying associations and even candidate genes for height, weight and width.