Pan-genomic matching statistics for targeted nanopore sequencing.
Ontology highlight
ABSTRACT: Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.
Project description:Genomic surveillance of Plasmodium falciparum malaria can provide policy-relevant information about antimalarial drug resistance, diagnostic test failure, and the evolution of vaccine targets. Yet the large and low complexity genome of P. falciparum complicates the development of genomic methods, while resource constraints in malaria endemic regions can limit their deployment. Here, we demonstrate an approach for targeted nanopore sequencing of P. falciparum from dried blood spots (DBS) that enables cost-effective genomic surveillance of malaria in low-resource settings. We release software that facilitates flexible design of amplicon sequencing panels and use this software to design two target panels for P. falciparum. The panels generate 3-4 kbp reads for eight and sixteen targets respectively, covering key drug-resistance associated genes, diagnostic test antigens, polymorphic markers and the vaccine target csp. We validate our approach on mock and field samples, demonstrating robust sequencing coverage, accurate variant calls within coding sequences, the ability to explore P. falciparum within-sample diversity and to detect deletions underlying rapid diagnostic test failure.
Project description:Despite recent improvements in sequencing methods, there remains a need for assays that provide high sequencing depth and comprehensive variant detection. Current methods1-4 are limited by the loss of native modifications, short read length, high input requirements, low yield or long protocols. In the present study, we describe nanopore Cas9-targeted sequencing (nCATS), an enrichment strategy that uses targeted cleavage of chromosomal DNA with Cas9 to ligate adapters for nanopore sequencing. We show that nCATS can simultaneously assess haplotype-resolved single-nucleotide variants, structural variations and CpG methylation. We apply nCATS to four cell lines, to a cell-line-derived xenograft, and to normal and paired tumor/normal primary human breast tissue. Median sequencing coverage was 675× using a MinION flow cell and 34× using the smaller Flongle flow cell. The nCATS sequencing requires only ~3??g of genomic DNA and can target a large number of loci in a single reaction. The method will facilitate the use of long-read sequencing in research and in the clinic.
Project description:Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual's disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies' (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This suggests that ONT sequencing has potential for in clinic or on-farm genomic prediction, however, further work to validate these findings in a larger population still remains.
Project description:Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.
Project description:BackgroundNonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge.ResultsTo identify and analyze endogenous targets of NMD, we apply cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identifies, most derive from alternative exon usage. The isoform-aware analysis reveals many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and, for those mRNAs with a termination codon in the last exon, the length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, though the main function of NMD seems to be ridding the transcriptome of isoforms resulting from spurious splicing events.ConclusionsLong-read sequencing enables the identification of many novel NMD-sensitive mRNAs and reveals both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.
Project description:We report a case of murine typhus in China caused by Rickettsia typhi and diagnosed by nanopore targeted sequencing of a bronchoalveolar lavage fluid sample. This case highlights that nanopore targeted sequencing can effectively detect clinically unexplained infections and be especially useful for detecting infections in patients without typical signs and symptoms.
Project description:Background and Objectives: SARS-CoV-2 is the first global threat and life-changing event of the twenty-first century. Although efficient treatments and vaccines have been developed, due to the virus's ability to mutate in key regions of the genome, whole viral genome sequencing is needed for efficient monitoring, evaluation of the spread, and even the adjustment of the molecular diagnostic assays. Materials and Methods: In this study, Nanopore and Ion Torrent sequencing technologies were used to detect the main SARS-CoV-2 circulating strains in Timis County, Romania, between February 2021 and May 2022. Results: We identified 22 virus lineages belonging to seven clades: 20A, 20I (Alpha, V1), 21B (Kappa), 21I (Delta), 21J (Delta), 21K (Omicron), and 21L (Omicron). Conclusions: Results obtained with both methods are comparable, and we confirm the utility of Nanopore sequencing in large-scale epidemiological surveillance due to the lower cost and reduced time for library preparation.
Project description:Actinomycetota (Actinobacteria) is an ecologically and industrially important phylum which is challenging to extract pure high-molecular-weight (HMW) DNA from. This protocol provides a parallelized, cost-effective, and straightforward approach for consistently extracting pure HMW DNA using modified non-toxic commercial kits suitable for higher throughput applications. We further provide a workflow for sequencing and assembly of complete genomes using an optimized Oxford Nanopore rapid barcoding protocol and Illumina data error correction.
Project description:BackgroundRoss River virus (RRV) is Australia's most common and widespread mosquito-transmitted arbovirus and is of significant public health concern. With increasing anthropogenic impacts on wildlife and mosquito populations, it is important that we understand how RRV circulates in its endemic hotspots to determine where public health efforts should be directed. Current surveillance methods are effective in locating the virus but do not provide data on the circulation of the virus and its strains within the environment. This study examined the ability to identify single nucleotide polymorphisms (SNPs) within the variable E2/E3 region by generating full-length haplotypes from a range of mosquito trap-derived samples.MethodsA novel tiled primer amplification workflow for amplifying RRV was developed with analysis using Oxford Nanopore Technology's MinION and a custom ARTIC/InterARTIC bioinformatic protocol. By creating a range of amplicons across the whole genome, fine-scale SNP analysis was enabled by specifically targeting the variable region that was amplified as a single fragment and established haplotypes that informed spatial-temporal variation of RRV in the study site in Victoria.ResultsA bioinformatic and laboratory pipeline was successfully designed and implemented on mosquito whole trap homogenates. Resulting data showed that genotyping could be conducted in real time and that whole trap consensus of the viruses (with major SNPs) could be determined in a timely manner. Minor variants were successfully detected from the variable E2/E3 region of RRV, which allowed haplotype determination within complex mosquito homogenate samples.ConclusionsThe novel bioinformatic and wet laboratory methods developed here will enable fast detection and characterisation of RRV isolates. The concepts presented in this body of work are transferable to other viruses that exist as quasispecies in samples. The ability to detect minor SNPs, and thus haplotype strains, is critically important for understanding the epidemiology of viruses their natural environment.
Project description:Nonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge. To identify and analyze endogenous targets of NMD, we applied cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identified, most derive from alternative exon usage. The isoform-aware analysis revealed many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and for mRNAs with a termination codon in the last exon. The length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, but a main function is most likely to rid the transcriptome of isoforms resulting from spurious splicing events. Long-read sequencing enabled the identification of many novel NMD-sensitive mRNAs and revealed both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.