Project description:Unveiling the potential application of psychrophilic polymerases as candidates for polymerase-nanopore long-read sequencing presents a departure from conventional choices such as thermophilic Bacillus stearothermophilus (Bst) renowned for its limitation in temperature and mesophilic Bacillus subtilis phage (phi29) polymerases for limitations in strong exonuclease activity and weak salt tolerance. Exploiting the PB-Bst fusion DNA polymerases from Psychrobacillus (PB) and Bacillus stearothermophilus (Bst), our structural and biochemical analysis reveal a remarkable enhancement in salt tolerance and a concurrent reduction in exonuclease activity, achieved through targeted substitution of a pivotal functional domain. The sulfolobus 7-kDa protein (Sso7d) emerges as a standout fusion domain, imparting significant improvements in PB-Bst processivity. Notably, this study elucidates additional functional sites regulating exonuclease activity (Asp43 and Glu45) and processivity using artificial nucleotides (Glu266, Gln283, Leu334, Glu335, Ser426, and Asp430). By disclosing the intricate dynamics in exonuclease activity, strand displacement, and artificial nucleotide-based processivity at specific functional sites, our findings not only advance the fundamental understanding of psychrophilic polymerases but also provide novel insights into polymerase engineering.
Project description:BACKGROUND:Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies-based long-read sequencing "nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. RESULTS:The Galaxy platform provides a user-friendly interface to computational command line-based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed "NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. CONCLUSIONS:A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.
Project description:RNA molecules can form secondary and tertiary structures that can regulate their localization and function. Using enzymatic or chemical probing together with high-throughput sequencing, secondary structure can be mapped across the entire transcriptome. However, a limiting factor is that only population averages can be obtained since each read is an independent measurement. Although long-read sequencing has recently been used to determine RNA structure, these methods still used aggregate signals across the strands to detect structure. Averaging across the population also means that only limited information about structural heterogeneity across molecules or dependencies within each molecule can be obtained. Here, we present Single-Molecule Structure sequencing (SMS-seq) that combines structural probing with native RNA sequencing to provide non-amplified, structural profiles of individual molecules with novel analysis methods. Our new approach using mutual information enabled single molecule structural interrogation. Each RNA is probed at numerous bases enabling the discovery of dependencies and heterogeneity of structural features. We also show that SMS-seq can capture tertiary interactions, dynamics of riboswitch ligand binding, and mRNA structural features.
Project description:BackgroundThe MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths from single-molecule sequencing - valuable features for detailed genome characterization. To realize the potential of this platform, a number of groups are developing bioinformatics tools tuned for the unique characteristics of its data. We note that these development efforts would benefit from a simulator software, the output of which could be used to benchmark analysis tools.ResultsHere, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data and allows for adjustments upon improvement of nanopore sequencing technology. The first step of NanoSim is read characterization, which provides a comprehensive alignment-based analysis and generates a set of read profiles serving as the input to the next step, the simulation stage. The simulation stage uses the model built in the previous step to produce in silico reads for a given reference genome. NanoSim is written in Python and R. The source files and manual are available at the Genome Sciences Centre website: http://www.bcgsc.ca/platform/bioinfo/software/nanosim.ConclusionIn this work, we model the base-calling errors of ONT reads to inform the simulation of sequences with similar characteristics. We showcase the performance of NanoSim on publicly available datasets generated using the R7 and R7.3 chemistries and different sequencing kits and compare the resulting synthetic reads to those of other long-sequence simulators and experimental ONT reads. We expect NanoSim to have an enabling role in the field and benefit the development of scalable next-generation sequencing technologies for the long nanopore reads, including genome assembly, mutation detection, and even metagenomic analysis software.
Project description:Bulk RNA sequencing of sorted peri-pancreatic LN cDC1s from different stages of neoplastic development in the KPC mouse model of pancreatic adenocarcinoma.
Project description:Many individuals with muscular dystrophies remain genetically undiagnosed despite clinical diagnostic testing, including exome sequencing. Some may harbor previously undetected structural variants (SVs) or cryptic splice sites. We enrolled 10 unrelated families: nine had muscular dystrophy but lacked complete genetic diagnoses and one had an asymptomatic DMD duplication. Nanopore genomic long-read sequencing identified previously undetected pathogenic variants in four individuals: an SV in DMD, an SV in LAMA2, and two single nucleotide variants in DMD that alter splicing. The DMD duplication in the asymptomatic individual was in tandem. Nanopore sequencing may help streamline genetic diagnostic approaches for muscular dystrophy.
Project description:Choosing the optimum assembly approach is essential to achieving a high-quality genome assembly suitable for comparative and evolutionary genomic investigations. Significant recent progress in long-read sequencing technologies such as PacBio and Oxford Nanopore Technologies (ONT) has also brought about a large variety of assemblers. Although these have been extensively tested on model species such as Homo sapiens and Drosophila melanogaster, such benchmarking has not been done in Mollusca, which lacks widely adopted model species. Molluscan genomes are notoriously rich in repeats and are often highly heterozygous, making their assembly challenging. Here, we benchmarked 10 assemblers based on ONT raw reads from two published molluscan genomes of differing properties, the gastropod Chrysomallon squamiferum (356.6 Mb, 1.59% heterozygosity) and the bivalve Mytilus coruscus (1593 Mb, 1.94% heterozygosity). By optimizing the assembly pipeline, we greatly improved both genomes from previously published versions. Our results suggested that 40-50X of ONT reads are sufficient for high-quality genomes, with Flye being the recommended assembler for compact and less heterozygous genomes exemplified by C. squamiferum, while NextDenovo excelled for more repetitive and heterozygous molluscan genomes exemplified by M. coruscus. A phylogenomic analysis using the two updated genomes with 32 other published high-quality lophotrochozoan genomes resulted in maximum support across all nodes, and we show that improved genome quality also leads to more complete matrices for phylogenomic inferences. Our benchmarking will ensure efficiency in future assemblies for molluscs and perhaps also for other marine phyla with few genomes available. This article is part of the Theo Murphy meeting issue 'Molluscan genomics: broad insights and future directions for a neglected phylum'.
Project description:The number of studies on plant transcriptomes using ONT RNAseq technology is rapidly increasing in recent. It is a powerful method to decipher transcriptomic complexity, particularly alternative splicing (AS) event detection. Citrus plants are the most important widely grown fruit crops. Exploring different AS events in citrus contributes to transcriptome improvement and functional genome study. Here, we performed ONT RNAseq in 9 species (Atalantia buxifolia, Citrus clementina, C. grandis, C. ichangensis, C. reticulata, C. sinensis, Clausena lansium, Fortunella hindsii, and Poncirus trifoliata), accompanied with Illumina sequencing. Non-redundant full-length isoforms were identified between 41,957 and 76,974 per species. Systematic analysis including different types of isoforms, number of isoforms per gene locus, isoform distribution, ORFs and lncRNA prediction and functional annotation were performed mainly focused on novel isoforms, unraveling the capability of novel isoforms detection and characterization. For AS events prediction, A3, RI, and AF were overwhelming types across 9 species. We analyzed isoform similarity and evolutionary relationships in all species. We identified that multiple isoforms derived from orthologous single copy genes among different species were annotated as enzymes, nuclear-related proteins or receptors. Isoforms with extending sequences on 5', 3', or both compared with reference genome were filtered out to provide information for transcriptome improvement. Our results provide novel insight into comprehending complex transcriptomes in citrus and valuable information for further investigation on the function of genes with diverse isoforms.