Project description:Sorghum bicolor (L.) Moench is a significant grass crop globally, known for its genetic diversity. High quality genome sequences are needed to capture the diversity. We constructed high-quality, chromosome-level genome assemblies for two vital sorghum inbred lines, Tx2783 and RTx436. Through advanced single-molecule techniques, long-read sequencing and optical maps, we improved average sequence continuity 19-fold and 11-fold higher compared to existing Btx623 v3.0 reference genome and obtained 19 and 18 scaffolds (N50 of 25.6 and 14.4) for Tx2783 and RTx436, respectively. Our gene annotation efforts resulted in 29 612 protein-coding genes for the Tx2783 genome and 29 265 protein-coding genes for the RTx436 genome. Comparative analyses with 26 plant genomes which included 18 sorghum genomes and 8 outgroup species identified around 31 210 protein-coding gene families, with about 13 956 specific to sorghum. Using representative models from gene trees across the 18 sorghum genomes, a total of 72 579 pan-genes were identified, with 14% core, 60% softcore and 26% shell genes. We identified 99 genes in Tx2783 and 107 genes in RTx436 that showed functional enrichment specifically in binding and metabolic processes, as revealed by the GO enrichment Pearson Chi-Square test. We detected 36 potential large inversions in the comparison between the BTx623 Bionano map and the BTx623 v3.1 reference sequence. Strikingly, these inversions were notably absent when comparing Tx2783 or RTx436 with the BTx623 Bionano map. These inversion were mostly in the pericentromeric region which is known to have low complexity regions and harder to assemble and suggests the presence of potential artifacts in the public BTx623 reference assembly. Furthermore, in comparison to Tx2783, RTx436 exhibited 324 883 additional Single Nucleotide Polymorphisms (SNPs) and 16 506 more Insertions/Deletions (INDELs) when using BTx623 as the reference genome. We also characterized approximately 348 nucleotide-binding leucine-rich repeat (NLR) disease resistance genes in the two genomes. These high-quality genomes serve as valuable resources for discovering agronomic traits and structural variation studies.
Project description:Quinoa is emerging as a key seed crop for global food security due to its ability to grow in marginal environments and its excellent nutritional properties. Because quinoa is partially allogamous, we have developed quinoa inbred lines necessary for molecular genetic analysis. Our comprehensive genomic analysis showed that the quinoa inbred lines fall into three genetic subpopulations: northern highland, southern highland, and lowland. Lowland and highland quinoa are the same species, but have very different genotypes and phenotypes. Lowland quinoa has relatively small grains and a darker grain color, and is widely tested and grown around the world. In contrast, the white, large-grained highland quinoa is grown in the Andean highlands, including the region where quinoa originated, and is exported worldwide as high-quality quinoa. Recently, we have shown that viral vectors can be used to regulate endogenous genes in quinoa, paving the way for functional genomics to reveal the diversity of quinoa. However, although a high-quality assembly has recently been reported for a lowland quinoa line, genomic resources of the quality required for functional genomics are not available for highland quinoa lines. Here we present high-quality chromosome-level genome assemblies for two highland inbred quinoa lines, J075 representing the northern highland line and J100 representing the southern highland line, using PacBio HiFi sequencing and dpMIG-seq. In addition, we demonstrate the importance of verifying and correcting reference-based scaffold assembly with other approaches such as linkage maps. The assembled genome sizes of J075 and J100 are 1.29 and 1.32 Gb, with contigs N50 of 66.3 and 12.6 Mb, and scaffold N50 of 71.2 and 70.6 Mb, respectively, comprising 18 pseudochromosomes. The repetitive sequences of J075 and J100 represent 72.6% and 71.5% of the genome, the majority of which are long terminal repeats, representing 44.0% and 42.7% of the genome, respectively. The de novo assembled genomes of J075 and J100 were predicted to contain 65,303 and 64,945 protein-coding genes, respectively. The high quality genomes of these highland quinoa lines will facilitate quinoa functional genomics research on quinoa and contribute to the identification of key genes involved in environmental adaptation and quinoa domestication.
Project description:BackgroundAccurate and complete reference genome assemblies are fundamental for biological research. Cucumber is an important vegetable crop and model system for sex determination and vascular biology. Low-coverage Sanger sequences and high-coverage short Illumina sequences have been used to assemble draft cucumber genomes, but the incompleteness and low quality of these genomes limit their use in comparative genomics and genetic research. A high-quality and complete cucumber genome assembly is therefore essential.FindingsWe assembled single-molecule real-time (SMRT) long reads to generate an improved cucumber reference genome. This version contains 174 contigs with a total length of 226.2 Mb and an N50 of 8.9 Mb, and provides 29.0 Mb more sequence data than previous versions. Using 10X Genomics and high-throughput chromosome conformation capture (Hi-C) data, 89 contigs (∼211.0 Mb) were directly linked into 7 pseudo-chromosome sequences. The newly assembled regions show much higher guanine-cytosine or adenine-thymine content than found previously, which is likely to have been inaccessible to Illumina sequencing. The new assembly contains 1,374 full-length long terminal retrotransposons and 1,078 novel genes including 239 tandemly duplicated genes. For example, we found 4 tandemly duplicated tyrosylprotein sulfotransferases, in contrast to the single copy of the gene found previously and in most other plants.ConclusionThis high-quality genome presents novel features of the cucumber genome and will serve as a valuable resource for genetic research in cucumber and plant comparative genomics.
Project description:The native, perennial shrub American hazelnut (Corylus americana) is cultivated in the Midwestern United States for its significant ecological benefits, as well as its high-value nut crop. Implementation of modern breeding methods and quantitative genetic analyses of C. americana requires high-quality reference genomes, a resource that is currently lacking. We therefore developed the first chromosome-scale assemblies for this species using the accessions 'Rush' and 'Winkler'. Genomes were assembled using HiFi PacBio reads and Arima Hi-C data, and Oxford Nanopore reads and a high-density genetic map were used to perform error correction. N50 scores are 31.9 Mb and 35.3 Mb, with 90.2% and 97.1% of the total genome assembled into the 11 pseudomolecules, for 'Rush' and 'Winkler', respectively. Gene prediction was performed using custom RNAseq libraries and protein homology data. 'Rush' has a BUSCO score of 99.0 for its assembly and 99.0 for its annotation, while 'Winkler' had corresponding scores of 96.9 and 96.5, indicating high-quality assemblies. These two independent assemblies enable unbiased assessment of structural variation within C. americana, as well as patterns of syntenic relationships across the Corylus genus. Furthermore, we identified high-density SNP marker sets from genotyping-by-sequencing data using 1343 C. americana, C. avellana and C. americana × C. avellana hybrids, in order to assess population structure in natural and breeding populations. Finally, the transcriptomes of these assemblies, as well as several other recently published Corylus genomes, were utilized to perform phylogenetic analysis of sporophytic self-incompatibility (SSI) in hazelnut, providing evidence of unique molecular pathways governing self-incompatibility in Corylus.
Project description:Asparagus kiusianus is a disease-resistant dioecious plant species and a wild relative of garden asparagus (Asparagus officinalis). To enhance A. kiusianus genomic resources, advance plant science, and facilitate asparagus breeding, we determined the genome sequences of the male and female lines of A. kiusianus. Genome sequence reads obtained with a linked-read technology were assembled into four haplotype-phased contig sequences (∼1.6 Gb each) for the male and female lines. The contig sequences were aligned onto the chromosome sequences of garden asparagus to construct pseudomolecule sequences. Approximately 55,000 potential protein-encoding genes were predicted in each genome assembly, and ∼70% of the genome sequence was annotated as repetitive. Comparative analysis of the genomes of the two species revealed structural and sequence variants between the two species as well as between the male and female lines of each species. Genes with high sequence similarity with the male-specific sex determinant gene in A. officinalis, MSE1/AoMYB35/AspTDF1, were presented in the genomes of the male line but absent from the female genome assemblies. Overall, the genome sequence assemblies, gene sequences, and structural and sequence variants determined in this study will reveal the genetic mechanisms underlying sexual differentiation in plants, and will accelerate disease-resistance breeding in garden asparagus.
Project description:Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving--for the human genome--98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.
Project description:Trichoptera is one of the most evolutionarily successful aquatic insect lineages and is highly valued value in adaptive evolution research. This study presents the chromosome-level genome assemblies of Himalopsyche anomala and Eubasilissa splendida achieved using PacBio, Illumina, and Hi-C sequencing. For H. anomala and E. splendida, assembly sizes were 663.43 and 859.28 Mb, with scaffold N50 lengths of 28.44 and 31.17 Mb, respectively. In H. anomala and E. splendida, we anchored 24 and 29 pseudochromosomes, and identified 11,469 and 10,554 protein-coding genes, respectively. The high-quality genomes of H. anomala and E. splendida provide critical genomic resources for understanding the evolution and ecology of Trichoptera and performing comparative genomics analyses.
Project description:The recent development of ecological studies has been fueled by the introduction of massive information based on chromosome-scale genome sequences, even for species for which genetic linkage is not accessible. This was enabled mainly by the application of Hi-C, a method for genome-wide chromosome conformation capture that was originally developed for investigating the long-range interaction of chromatins. Performing genomic scaffolding using Hi-C data is highly resource-demanding and employs elaborate laboratory steps for sample preparation. It starts with building a primary genome sequence assembly as an input, which is followed by computation for genome scaffolding using Hi-C data, requiring careful validation. This article presents technical considerations for obtaining optimal Hi-C scaffolding results and provides a test case of its application to a reptile species, the Madagascar ground gecko (Paroedura picta). Among the metrics that are frequently used for evaluating scaffolding results, we investigate the validity of the completeness assessment of chromosome-scale genome assemblies using single-copy reference orthologues.
Project description:Mosses compose one of the three lineages of bryophytes. Today, about 13,000 species of mosses are recognized from across the globe, and at least one-third of this diversity composes the Hypnales, a lineage characterized by an early rapid radiation. We sequenced and de novo assembled the genomes of two hypnalean mosses, namely Entodon seductrix and Hypnum curvifolium, based on the 10x genomics and Hi-C data. The genome assemblies of E. seductrix and H. curvifolium comprise 348.4 and 262.0 Mb, respectively, estimated by k-mer analyses to represent 93.3% and 97.2% of their total genome size. Both genomes were assembled at the chromosome level, with scaffold N50 of 30.0 and 20.7 Mb, respectively. The annotated genome of E. seductrix comprises 25,801 protein-coding genes and that of H. curvifolium 29,077, estimated to represent 96.8% and 97.2%, respectively, of the total gene spaces based on BUSCO (Benchmarking Universal Single-Copy Ortholog) assessment. For both genomes, most contigs were anchored to the largest 11 pseudomolecules, corresponding to the 11 chromosomes of the two species, and each with a putative sex-related chromosome characterized by low gene density. The chromosomes of E. seductrix and H. curvifolium are highly syntenic, suggests limited architectural shifts occurred following the rapid radiation of the Hypnales. We compared their genomic features to the model moss Physcomitrium patens. The hypnalean moss genomes lack signatures of recent whole-genome duplication. The presented high-quality moss genomes provide new resources for comparative genomics to potentially unveil the genomic evolution of derived moss lineages.
Project description:This study presents the first chromosome-level genome assembly of Hanwoo, an indigenous Korean breed of Bos taurus taurus. This is the first genome assembly of Asian taurus breed. Also, we constructed a pangenome graph of 14 B. taurus genome assemblies. The contig N50 was over 55 Mb, the scaffold N50 was over 89 Mb and a genome completeness of 95.8%, as estimated by BUSCO using the mammalian set, indicated a high-quality assembly. 48.7% of the genome comprised various repetitive elements, including DNAs, tandem repeats, long interspersed nuclear elements, and simple repeats. A total of 27,314 protein-coding genes were identified, including 25,302 proteins with inferred gene names and 2,012 unknown proteins. The pangenome graph of 14 B. taurus autosomes revealed 528.47 Mb non-reference regions in total and 61.87 Mb Hanwoo-specific regions. Our Hanwoo assembly and pangenome graph provide valuable resources for studying B. taurus populations.