Project description:Our study presents the assembly of a high-quality Taihu goose genome at the Telomere-to-Telomere (T2T) level. By employing advanced sequencing technologies, including Pacific Biosciences HiFi reads, Oxford Nanopore long reads, Illumina short reads, and chromatin conformation capture (Hi-C), we achieved an exceptional assembly. The T2T assembly encompasses a total length of 1,197,991,206 bp, with contigs N50 reaching 33,928,929 bp and scaffold N50 attaining 81,007,908 bp. It consists of 73 scaffolds, including 38 autosomes and one pair of Z/W sex chromosomes. Importantly, 33 autosomes were assembled without any gap, resulting in a contiguous representation. Furthermore, gene annotation efforts identified 34,898 genes, including 436,162 RNA transcripts, encompassing 806,158 exons, 743,910 introns, 651,148 coding sequences (CDS), and 135,622 untranslated regions (UTR). The T2T-level chromosome-scale goose genome assembly provides a vital foundation for future genetic improvement and understanding the genetic mechanisms underlying important traits in geese.
Project description:Here, we report the first telomere-to-telomere genome assembly of matsutake (Tricholoma matsutake), which consists of 13 sequences (spanning 161.0 Mb) and a 76 kb circular mitochondrial genome. All the 13 sequences were supported with telomeric repeats at the ends. GC-rich regions are located at the middle of the sequences and are enriched with long interspersed nuclear elements (LINEs). Repetitive sequences including long-terminal repeats (LTRs) and LINEs occupy 71.6% of the genome. A total of 21,887 potential protein-coding genes were predicted. The genomic data reported in this study served not only matsutake gene sequences but also genome structures and intergenic sequences. The information gained would be a great reference for exploring the genetics, genomics, and evolutionary study of matsutake in the future, and ultimately facilitate the conservation of this vulnerable genetic resource.
Project description:Phaeodactylum tricornutum is a marine diatom with a growing genetic toolbox available and is being used in many synthetic biology applications. While most of the genome has been assembled, the currently available genome assembly is not a completed telomere-to-telomere assembly. Here, we used Oxford Nanopore long reads to build a telomere-to-telomere genome for Phaeodactylum tricornutum. We developed a graph-based approach to extract all unique telomeres, and used this information to manually correct assembly errors. In total, we found 25 nuclear chromosomes that comprise all previously assembled fragments, in addition to the chloroplast and mitochondrial genomes. We found that chromosome 19 has filtered long-read coverage and a quality estimate that suggests significantly less haplotype sequence variation than the other chromosomes. This work improves upon the previous genome assembly and provides new opportunities for genetic engineering of this species, including creating designer synthetic chromosomes.
Project description:With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.
Project description:BackgroundTrichoderma is a genus of fungi in the family Hypocreaceae and includes species known to produce enzymes with commercial use. They are largely found in soil and terrestrial plants. Recently, Trichoderma simmonsii isolated from decaying bark and decorticated wood was newly identified in the Harzianum clade of Trichoderma. Due to a wide range of applications in agriculture and other industries, genomes of at least 12 Trichoderma spp. have been studied. Moreover, antifungal and enzymatic activities have been extensively characterized in Trichoderma spp. However, the genomic information and bioactivities of T. simmonsii from a particular marine-derived isolate remain largely unknown. While we screened for asparaginase-producing fungi, we observed that T. simmonsii GH-Sj1 strain isolated from edible kelp produced asparaginase. In this study, we report a draft genome of T. simmonsii GH-Sj1 using Illumina and Oxford Nanopore technologies. Furthermore, to facilitate biotechnological applications of this species, RNA-sequencing was performed to elucidate the transcriptional profile of T. simmonsii GH-Sj1 in response to asparaginase-rich conditions.ResultsWe generated ~ 14 Gb of sequencing data assembled in a ~ 40 Mb genome. The T. simmonsii GH-Sj1 genome consisted of seven telomere-to-telomere scaffolds with no sequencing gaps, where the N50 length was 6.4 Mb. The total number of protein-coding genes was 13,120, constituting ~ 99% of the genome. The genome harbored 176 tRNAs, which encode a full set of 20 amino acids. In addition, it had an rRNA repeat region consisting of seven repeats of the 18S-ITS1-5.8S-ITS2-26S cluster. The T. simmonsii genome also harbored 7 putative asparaginase-encoding genes with potential medical applications. Using RNA-sequencing analysis, we found that 3 genes among the 7 putative genes were significantly upregulated under asparaginase-rich conditions.ConclusionsThe genome and transcriptome of T. simmonsii GH-Sj1 established in the current work represent valuable resources for future comparative studies on fungal genomes and asparaginase production.
Project description:A complete telomere-to-telomere (T2T) finished genome has been the long pursuit of genomic research. Through generating deep coverage ultralong Oxford Nanopore Technology (ONT) and PacBio HiFi reads, we report here a complete genome assembly of maize with each chromosome entirely traversed in a single contig. The 2,178.6 Mb T2T Mo17 genome with a base accuracy of over 99.99% unveiled the structural features of all repetitive regions of the genome. There were several super-long simple-sequence-repeat arrays having consecutive thymine-adenine-guanine (TAG) tri-nucleotide repeats up to 235 kb. The assembly of the entire nucleolar organizer region of the 26.8 Mb array with 2,974 45S rDNA copies revealed the enormously complex patterns of rDNA duplications and transposon insertions. Additionally, complete assemblies of all ten centromeres enabled us to precisely dissect the repeat compositions of both CentC-rich and CentC-poor centromeres. The complete Mo17 genome represents a major step forward in understanding the complexity of the highly recalcitrant repetitive regions of higher plant genomes.
Project description:Bursaphelenchus okinawaensis is a self-fertilizing, hermaphroditic, fungus-feeding nematode used as a laboratory model for the genus Bursaphelenchus, which includes the important pathogen Bursaphelenchus xylophilus Here, we report the nearly complete genome sequence of B. okinawaensis The 70-Mbp assembly contained six scaffolds (>11 Mbp each) with telomere repeats on their ends, indicating complete chromosomes.
Project description:BackgroundAnatidae contains numerous waterfowl species with great economic value, but the genetic diversity basis remains insufficiently investigated. Here, we report a chromosome-level genome assembly of Lion-head goose (Anser cygnoides), a native breed in South China, through the combination of PacBio, Bionano, and Hi-C technologies.FindingsThe assembly had a total genome size of 1.19 Gb, consisting of 1,859 contigs with an N50 length of 20.59 Mb, generating 40 pseudochromosomes, representing 97.27% of the assembled genome, and identifying 21,208 protein-coding genes. Comparative genomic analysis revealed that geese and ducks diverged approximately 28.42 million years ago, and geese have undergone massive gene family expansion and contraction. To identify genetic markers associated with body weight in different geese breeds, including Wuzong goose, Huangzong goose, Magang goose, and Lion-head goose, a genome-wide association study was performed, yielding an average of 1,520.6 Mb of raw data that detected 44,858 single-mucleotide polymorphisms (SNPs). Genome-wide association study showed that 6 SNPs were significantly associated with body weight and 25 were potentially associated. The significantly associated SNPs were annotated as LDLRAD4, GPR180, and OR, enriching in growth factor receptor regulation pathways.ConclusionsWe present the first chromosome-level assembly of the Lion-head goose genome, which will expand the genomic resources of the Anatidae family, providing a basis for adaptation and evolution. Candidate genes significantly associated with different goose breeds may serve to understand the underlying mechanisms of weight differences.
Project description:The grain sorghum inbred line 654 serves as a parent for numerous Chinese commercial hybrids and recombinant inbred lines (RILs), which have played a pivotal role in the cloning of several agronomically important traits. In this study, we present a telomere-to-telomere (T2T) genome assembly of the inbred line 654 (728.81 Mb) using PacBio HiFi, ultra-long Oxford Nanopore Technology, and Hi-C sequencing data. The T2T genome assembly has high integrity (contains all of 10 centromeres and 20 telomeres without any gaps), high contiguity (contig N90: 52.02 Mb), high completeness (98.33% BUSCO completeness, 98.88% k-mer completeness, and LAI 24.38), and extremely low base error (3.37 × 10-7, QV: 64.72). A total of 62.34% sequences were identified as repetitive, and rest region contained 44,399 protein-coding genes, of which 30,245 were functionally annotated. The gap-free T2T genome assembly enables the full picture of the potential translational genomics, and provides the highest resolution genetic map for future studies on genome evolution, structure variation, and the genetic control of agronomic traits in sorghum breeding.