Project description:Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29-78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.
Project description:Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.
Project description:Cipangopaludina cathayensis (Gastropoda: Prosobranchia; Mesogastropoda; Viviparidae) is widely distributed in the freshwater habitats of China. It is an economically important snail with high edible and medicinal value. However, the genomic resources and the reference genome of this snail are lacking. In this study, we assembled the first chromosome-level genome of C. cathayensis. The preliminary assembly genome was 1.48 Gb in size, with a contig N50 size of 93.49 Mb. The assembled sequences were anchored to nine pseudochromosomes using Hi-C data. The final genome after Hi-C correction was 1.48 Gb, with a contig N50 of 98.49 Mb and scaffold N50 of 195.21 Mb. The anchored rate of the chromosome was 99.99%. A total of 22,702 protein-coding genes were predicted. Phylogenetic analyses indicated that C. cathayensis diverged with Bellamya purificata approximately 158.10 million years ago. There were 268 expanded and 505 contracted gene families in C. cathayensis when compared with its most recent common ancestor. Five putative genes under positive selection in C. cathayensis were identified (false discovery rate <0.05). These genome data provide a valuable resource for evolutionary studies of the family Viviparidae, and for the genetic improvement of C. cathayensis.
Project description:BackgroundAnts with complex societies have fascinated scientists for centuries. Comparative genomic and transcriptomic analyses across ant species and castes have revealed important insights into the molecular mechanisms underlying ant caste differentiation. However, most current ant genomes and transcriptomes are highly fragmented and incomplete, which hinders our understanding of the molecular basis for complex ant societies.FindingsBy hybridizing Illumina, Pacific Biosciences, and Hi-C sequencing technologies, we de novo assembled a chromosome-level genome for Monomorium pharaonis, with a scaffold N50 of 27.2 Mb. Our new assembly provides better resolution for the discovery of genome rearrangement events at the chromosome level. Analysis of full-length isoform sequencing (ISO-seq) suggested that ?15 Gb of ISO-seq data were sufficient to cover most expressed genes, but the number of transcript isoforms steadily increased with sequencing data coverage. Our high-depth ISO-seq data greatly improved the quality of gene annotation and enabled the accurate detection of alternative splicing isoforms in different castes of M. pharaonis. Comparative transcriptome analysis across castes based on the ISO-seq data revealed an unprecedented number of transcript isoforms, including many caste-specific isoforms. We also identified a number of conserved long non-coding RNAs that evolved specifically in ant lineages and several that were conserved across insect lineages.ConclusionsWe produced a high-quality chromosome-level genome for M. pharaonis, which significantly improved previous short-read assemblies. Together with full-length transcriptomes for all castes, we generated a highly accurate annotation for this ant species. These long-read sequencing results provide a useful resource for future functional studies on the genetic mechanisms underlying the evolution of social behaviors and organization in ants.
Project description:Rosa rugosa is an important shrub with economic, ecological, and pharmaceutical value. A high-quality chromosome-scale genome for R. rugosa sequences was assembled using PacBio and Hi-C technologies. The final assembly genome sequences size was about 407.1 Mb, the contig N50 size was 2.85 Mb, and the scaffold N50 size was 56.6 Mb. More than 98% of the assembled genome sequences were anchored to seven pseudochromosomes (402.9 Mb). The genome contained 37,512 protein-coding genes, with 37,016 genes (98.68%) that were functionally annotated, and 206.67 Mb (50.76%) of the assembled sequences are repetitive sequences. Phylogenetic analyses indicated that R. rugosa diverged from Rosa chinensis ∼6.6 million years ago, and no lineage-specific whole-genome duplication event occurred after divergence from R. chinensis. Chromosome synteny analysis demonstrated highly conserved synteny between R. rugosa and R. chinensis, between R. rugosa and Prunus persica as well. Comparative genome and transcriptome analysis revealed genes related to colour, scent, and environment adaptation. The chromosome-level reference genome provides important genomic resources for molecular-assisted breeding and horticultural comparative genomics research.
Project description:Oryza coarctata (2n = 4X = 48, KKLL) is an allotetraploid, undomesticated relative of rice and the only species in the genus Oryza with tolerance to high salinity and submergence. Therefore, it contains important stress and tolerance genes/factors for rice. The initial draft genome published was limited by data and technical restrictions, leading to an incomplete and highly fragmented assembly. This study reports a new, highly contiguous chromosome-level genome assembly and annotation of O. coarctata. PacBio high-quality HiFi reads generated 460 contigs with a total length of 573.4 Mb and an N50 of 23.1 Mb, which were assembled into scaffolds with Hi-C data, anchoring 96.99% of the assembly onto 24 chromosomes. The genome assembly comprises 45,571 genes, and repetitive content contributes 25.5% of the genome. This study provides the novel identification of the KK and LL genome types of the genus Oryza, leading to valuable insights into rice genome evolution. The chromosome-level genome assembly of O. coarctata is a valuable resource for rice research and molecular breeding.
Project description:Andrographis paniculata (Chinese name: Chuanxinlian) is an annual dicotyledonous medicinal plant widely grown in China and Southeast Asia. The dried plant has a highly acclaimed usage in the traditional Chinese medicine for its antipyretic, anti-inflammatory, and analgesic effects. In order to help delineate the biosynthetic pathways of various secondary metabolites, we report in this study a high-quality reference genome for A. paniculata. With the help of both PacBio single molecule real time sequencing and Illumina sequencing reads for error correction, the A. paniculata genome was assembled into a total size of 284 Mb with a contig N50 size of 5.14 Mb. The contigs were further assembled into 24 pseudo-chromosomes by the Hi-C technique. We also analyzed the gene families (e.g., KSL, and CYP450) whose protein products are essential for synthesizing bioactive compounds in A. paniculata. In conclusion, the high-quality A. paniculata genome assembly builds the foundation for decoding the biosynthetic pathways of various medicinal compounds.