ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell
Ontology highlight
ABSTRACT: Target enrichment sequencing and variant calling on medical exome using ONT MinION
Project description:BackgroundThe application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample.MethodWe introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples.ResultsECNano achieved deep on-target depth of coverage (DoC) at average > 100× and > 98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30× DoC. ECNano obtained an average read length of 1000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30× DoC. Clair-ensemble achieved > 99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days.ConclusionWe presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.
Project description:Genetic markers (DNA barcodes) are often used to support and confirm species identification. Barcode sequences can be generated in the field using portable systems based on the Oxford Nanopore Technologies (ONT) MinION sequencer. However, to achieve a broader application, current proof-of-principle workflows for on-site barcoding analysis must be standardized to ensure a reliable and robust performance under suboptimal field conditions without increasing costs. Here, we demonstrate the implementation of a new on-site workflow for DNA extraction, PCR-based barcoding, and the generation of consensus sequences. The portable laboratory features inexpensive instruments that can be carried as hand luggage and uses standard molecular biology protocols and reagents that tolerate adverse environmental conditions. Barcodes are sequenced using MinION technology and analyzed with ONTrack, an original de novo assembly pipeline that requires as few as 1000 reads per sample. ONTrack-derived consensus barcodes have a high accuracy, ranging from 99.8 to 100%, despite the presence of homopolymer runs. The ONTrack pipeline has a user-friendly interface and returns consensus sequences in minutes. The remarkable accuracy and low computational demand of the ONTrack pipeline, together with the inexpensive equipment and simple protocols, make the proposed workflow particularly suitable for tracking species under field conditions.
Project description:The Oxford Nanopore (ONT) platform provides portable and rapid genome sequencing, and its ability to natively profile DNA methylation without complex sample processing is attractive for clinical sequencing. We recently demonstrated ONT shallow whole-genome sequencing to detect copy number alterations (CNA) from the circulating tumor DNA (ctDNA) of cancer patients. Here, we show that cell-type and cancer-specific methylation changes can also be detected, as well as cancer-associated fragmentation signatures. This feasibility study suggests that ONT shallow WGS could be a powerful tool for liquid biopsy, especially real-time medical applications.
Project description:BackgroundDue to the frequent reassortment and zoonotic potential of influenza A viruses, rapid gain of sequence information is crucial. Alongside established next-generation sequencing protocols, the MinION sequencing device (Oxford Nanopore Technologies) has become a serious competitor for routine whole-genome sequencing. Here, we established a novel, rapid and high-throughput MinION multiplexing workflow based on a universal RT-PCR.MethodsTwelve representative influenza A virus samples of multiple subtypes were universally amplified in a one-step RT-PCR and subsequently sequenced on the MinION instrument in conjunction with a barcoding library preparation kit from the rapid family and the MinIT performing live base-calling. The identical PCR products were sequenced on an IonTorrent platform and, after final consensus assembly, all data was compared for validation. To prove the practicability of the MinION-MinIT method in human and veterinary diagnostics, we sequenced recent and historical influenza strains for further benchmarking.ResultsThe MinION-MinIT combination generated over two million reads for twelve samples in a six-hour sequencing run, from which a total of 72% classified as quality screened, trimmed and mapped influenza reads to produce full genome sequences. Identities between the datasets of > 99.9% were achieved, with 100% coverage of all segments alongside a sufficient confidence and 4492fold mean depth. From RNA extraction to finished sequences, only 14 h were required.ConclusionsOverall, we developed and validated a novel and rapid multiplex workflow for influenza A virus sequencing. This protocol suits both clinical and academic settings, aiding in real time diagnostics and passive surveillance.
Project description:The MinION device by Oxford Nanopore produces very long reads (reads over 100 kBp were reported); however it suffers from high sequencing error rate. We present an open-source DNA base caller based on deep recurrent neural networks and show that the accuracy of base calling is much dependent on the underlying software and can be improved by considering modern machine learning methods. By employing carefully crafted recurrent neural networks, our tool significantly improves base calling accuracy on data from R7.3 version of the platform compared to the default base caller supplied by the manufacturer. On R9 version, we achieve results comparable to Nanonet base caller provided by Oxford Nanopore. Availability of an open source tool with high base calling accuracy will be useful for development of new applications of the MinION device, including infectious disease detection and custom target enrichment during sequencing.
Project description:BackgroundStructural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples.FindingsWe present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available.ConclusionParliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Project description:Illumina 1M Omni Quad arrays were used to test mutation calling accuracy of qSNP tool (a mutation caller) Ilumina array genotypes with GenCal (GC score)>0.70 were used in the comparison of genotype calls using next generation sequencing data and qSNP (mutation caller)
Project description:Calling cards technology using self-reporting transposons enables the identification of DNA-protein interactions through RNA sequencing . Here, we have drastically reduced the cost and labor requirements of calling card experiments in bulk populations of cells by introducing a DNA barcode into the calling card itself. An additional barcode incorporated during reverse transcription enables simultaneous transcriptome measurement in a facile and affordable protocol. We demonstrate that barcoded self-reporting transposons recover in vitro binding sites for four basic helix-loop-helix transcription factors with important roles in cell fate specification: ASCL1, MYOD1, NEUROD2, and NGN1. Further, simultaneous calling cards and transcriptional profiling during transcription factor overexpression identified both binding sites and gene expression changes for two of these factors. In sum, RNA-based identification of transcription factor binding sites and gene expression through barcoded self-reporting transposon calling cards and transcriptomes is an efficient and powerful method to infer gene regulatory networks in a population of cells.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.