Project description:Interventions: Genomic test CANCERPLEX-JP OncoGuide NCC oncopanel system FndationONe CDx genome profile GUARDANT360 MSI Analysis System BRACAnalysis
Primary outcome(s): Development of genome database
Study Design: Single arm Non-randomized
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long-reads and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from three different tissue types from three other species of squid species (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein coding genes supported by evidence and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:<p><strong>BACKGROUND:</strong> Traditional Chinese medicine has used <em>Peucedanum praeruptorum</em> Dunn (Apiaceae) for a long time. Various coumarins, including the significant root constituents Praeruptorin (A-E), are the active constituents of the dried roots of P. praeruptorum. Previous transcriptomic and metabolomic studies attempted to elucidate the distribution and biosynthetic network of these medicinal-valuable compounds. However, the lack of a high-quality reference genome impedes an in-depth understanding of genetic traits and, thus, the development of better breeding strategies.</p><p><strong>RESULTS:</strong> The authors assembled a telomere-to-telomere genome by combining PacBio HiFi, ONT ultra- long and Hi-C data. The final genome assembly was approximately 1.798 Gb, assigned to 11 chromosomes and genome completeness >98%. Comparative genomic analysis suggested that <em>P. praeruptorum</em> experienced two WGD events like the ones in the Apiaceae family. By the transcriptomic and metabolomic analysis of the coumarin metabolic pathway, we presented coumarins' spatial and temporal distribution and the expression patterns of critical genes for its biosynthesis. Notably, the <em>COSY</em> and cytochrome <em>P450</em> genes showed tandem duplications on several chromosomes, which may be responsible for the high accumulation of coumarins.</p><p><strong>CONCLUSIONS:</strong> The authors obtained a T2T genome for <em>P. praeruptorum</em>, which provides molecular insights into the chromosomal distribution of the coumarin biosynthetic genes. This high-quality genome is an essential resource for designing engineering strategies for improving the production of these valuable compounds.</p>
Project description:<p>The section <em>Oleifera</em> (Theaceae) has attracted attention for the high levels of unsaturated fatty acids found in its seeds. Here, we report the chromosome-scale genome of the sect. <em>Oleifera</em> using diploid wild <em>Camellia lanceoleosa</em> with a final size of 3.00 Gb and an N50 scaffold size of 186.43 Mb. Repetitive sequences accounted for 80.63% and were distributed unevenly across the genome. <em>Camellia lanceoleosa</em> underwent a whole-genome duplication event approximately 65 million years ago (65 Mya), prior to the divergence of <em>C</em>. <em>lanceoleosa</em> and <em>Camellia sinensis</em> (approx. 6-7 Mya). Syntenic comparisons of these two species elucidated the genomic rearrangement, appearing to be driven in part by the activity of transposable elements. The expanded and positively selected genes in <em>C</em>. <em>lanceoleosa</em> were significantly enriched in oil biosynthesis, and the expansion of homomeric <em>acetyl-coenzyme A carboxylase</em> (<em>ACCase</em>) genes and the seed-biased expression of genes encoding heteromeric ACCase, diacylglycerol acyltransferase, glyceraldehyde-3-phosphate dehydrogenase and stearoyl-ACP desaturase could be of primary importance for the high oil and oleic acid content found in <em>C. lanceoleosa</em>. Theanine and catechins were present in the leaves of <em>C</em>. <em>lanceoleosa</em>. However, caffeine can not be dectected in the leaves but was abundant in the seeds and roots. The functional and transcriptional divergence of genes encoding SAM-dependent <em>N</em>-methyltransferases may be associated with caffeine accumulation and distribution. Gene expression profiles, structural composition and chromosomal location suggest that the late-acting self-incompatibility of <em>C. lanceoleosa</em> is likely to have favoured a novel mechanism co-occurring with gametophytic self-incompatibility. This study provides valuable resources for quantitative and qualitative improvements and genome assembly of polyploid plants in sect. <em>Oleifera</em>.</p>