Project description:The advent of next-generation sequencing (NGS) has accelerated biomedical research by enabling the high-throughput analysis of DNA sequences at a very low cost. However, NGS has limitations in detecting rare-frequency variants (<?1%) because of high sequencing errors (>?0.1~1%). NGS errors could be filtered out using molecular barcodes, by comparing read replicates among those with the same barcodes. Accordingly, these barcoding methods require redundant reads of non-target sequences, resulting in high sequencing cost. Here, we present a cost-effective NGS error validation method in a barcode-free manner. By physically extracting and individually amplifying the DNA clones of erroneous reads, we distinguish true variants of frequency?>?0.003% from the systematic NGS error and selectively validate NGS error after NGS. We achieve a PCR-induced error rate of 2.5×10-6 per base per doubling event, using 10 times less sequencing reads compared to those from previous studies.
Project description:BACKGROUND:Pediatric leukemias have a diverse genomic landscape associated with complex structural variants, including gene fusions, insertions and deletions, and single nucleotide variants. Routine karyotype and fluorescence in situ hybridization (FISH) techniques lack sensitivity for smaller genomic alternations. Next-generation sequencing (NGS) assays are being increasingly utilized for assessment of these various lesions. However, standard NGS lacks quantitative sensitivity for minimal residual disease (MRD) surveillance due to an inherently high error rate. METHODS:Primary bone marrow samples from pediatric leukemia (n =?32) and adult leukemia subjects (n =?5), cell line MV4-11, and an umbilical cord sample were utilized for this study. Samples were sequenced using molecular barcoding with targeted DNA and RNA library enrichment techniques based on anchored multiplexed PCR (AMP®) technology, amplicon based error-corrected sequencing (ECS) or a human cancer transcriptome assay. Computational analyses were performed to quantitatively assess limit of detection (LOD) for various DNA and RNA lesions, which could be systematically used for MRD assays. RESULTS:Matched leukemia patient samples were analyzed at three time points; diagnosis, end of induction (EOI), and relapse. Similar to flow cytometry for ALL MRD, the LOD for point mutations by these sequencing strategies was ?0.001. For DNA structural variants, FLT3 internal tandem duplication (ITD) positive cell line and patient samples showed a LOD of ?0.001 in addition to previously unknown copy number losses in leukemia genes. ECS in RNA identified multiple novel gene fusions, including a SPANT-ABL gene fusion in an ALL patient, which could have been used to alter therapy. Collectively, ECS for RNA demonstrated a quantitative and complex landscape of RNA molecules with 12% of the molecules representing gene fusions, 12% exon duplications, 8% exon deletions, and 68% with retained introns. Droplet digital PCR validation of ECS-RNA confirmed results to single mRNA molecule quantities. CONCLUSIONS:Collectively, these assays enable a highly sensitive, comprehensive, and simultaneous analysis of various clonal leukemic mutations, which can be tracked across disease states (diagnosis, EOI, and relapse) with a high degree of sensitivity. The approaches and results presented here highlight the ability to use NGS for MRD tracking.
Project description:Advances in high-throughput sequencing have enabled technologies that probe the adaptive immune system with unprecedented depth. We have developed a multiplex PCR method to sequence tens of millions of T cell receptors (TCRs) from a single sample in a few days. A method is presented to test the precision, accuracy, and sensitivity of this assay. T cell clones, each with one fixed productive TCR rearrangement, are doped into complex blood cell samples. TCRs from a total of eleven samples are sequenced, with the doped T cell clones ranging from 10% of the total sample to 0.001% (one cell in 100,000). The assay is able to detect even the rarest clones. The precision of the assay is demonstrated across five orders of magnitude. The accuracy for each clone is within an overall factor of three across the 100,000 fold dynamic range. Additionally, the assay is shown to be highly repeatable.
Project description:Error-corrected sequences (ECSs) that utilize double-stranded DNA sequences are useful in detecting mutagen-induced mutations. However, relatively higher frequencies of G:C > T:A (1 × 10-7 bp) and G:C > C:G (2 × 10-7 bp) errors decrease the accuracy of detection of rare G:C mutations (approximately 10-7 bp). Oxidized guanines in single-strand (SS) overhangs generated after shearing could serve as the source of these errors. To remove these errors, we first computationally discarded up to 20 read bases corresponding to the ends of the DNA fragments. Error frequencies decreased proportionately with trimming length; however, the results indicated that they were not sufficiently removed. To efficiently remove SS overhangs, we evaluated three mechanistically distinct SS-specific nucleases (S1 Nuclease, mung bean nuclease, and RecJf exonuclease) and found that they were more efficient than computational trimming. Consequently, we established Jade-Seq™, an ECS protocol with S1 Nuclease treatment, which reduced G:C > T:A and G:C > C:G errors to 0.50 × 10-7 bp and 0.12 × 10-7 bp, respectively. This was probably because S1 Nuclease removed SS regions, such as gaps and nicks, depending on its wide substrate specificity. Subsequently, we evaluated the mutation-detection sensitivity of Jade-Seq™ using DNA samples from TA100 cells exposed to 3-methylcholanthrene and 7,12-dimethylbenz[a]anthracene, which contained the rare G:C > T:A mutation (i.e., 2 × 10-7 bp). Fold changes of G:C > T:A compared to the vehicle control were 1.2- and 1.3-times higher than those of samples without S1 Nuclease treatment, respectively. These findings indicate the potential of Jade-Seq™ for detecting rare mutations and determining the mutagenicity of environmental mutagens.
Project description:BACKGROUND:Circulating free DNA sequencing (cfDNA-Seq) can portray cancer genome landscapes, but highly sensitive and specific technologies are necessary to accurately detect mutations with often low variant frequencies. METHODS:We developed a customizable hybrid-capture cfDNA-Seq technology using off-the-shelf molecular barcodes and a novel duplex DNA molecule identification tool for enhanced error correction. RESULTS:Modeling based on cfDNA yields from 58 patients showed that this technology, requiring 25 ng of cfDNA, could be applied to >95% of patients with metastatic colorectal cancer (mCRC). cfDNA-Seq of a 32-gene, 163.3-kbp target region detected 100% of single-nucleotide variants, with 0.15% variant frequency in spike-in experiments. Molecular barcode error correction reduced false-positive mutation calls by 97.5%. In 28 consecutively analyzed patients with mCRC, 80 out of 91 mutations previously detected by tumor tissue sequencing were called in the cfDNA. Call rates were similar for point mutations and indels. cfDNA-Seq identified typical mCRC driver mutations in patients in whom biopsy sequencing had failed or did not include key mCRC driver genes. Mutations only called in cfDNA but undetectable in matched biopsies included a subclonal resistance driver mutation to anti-EGFR antibodies in KRAS, parallel evolution of multiple PIK3CA mutations in 2 cases, and TP53 mutations originating from clonal hematopoiesis. Furthermore, cfDNA-Seq off-target read analysis allowed simultaneous genome-wide copy number profile reconstruction in 20 of 28 cases. Copy number profiles were validated by low-coverage whole-genome sequencing. CONCLUSIONS:This error-corrected, ultradeep cfDNA-Seq technology with a customizable target region and publicly available bioinformatics tools enables broad insights into cancer genomes and evolution. CLINICALTRIALSGOV IDENTIFIER:NCT02112357.
Project description:Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.
Project description:E18 mouse brain single cell profiling using the 10x Genomics Chromium instrument workflow with either Illumina short read sequencing for the standard gene profiling and Nanopore PromethION long read sequencing for isoform profiling.
Project description:There is a limited understanding about the impact of rare protein-truncating variants across multiple phenotypes. We explore the impact of this class of variants on 13 quantitative traits and 10 diseases using whole-exome sequencing data from 100,296 individuals. Protein-truncating variants in genes intolerant to this class of mutations increased risk of autism, schizophrenia, bipolar disorder, intellectual disability, and ADHD. In individuals without these disorders, there was an association with shorter height, lower education, increased hospitalization, and reduced age at enrollment. Gene sets implicated from GWASs did not show a significant protein-truncating variants burden beyond what was captured by established Mendelian genes. In conclusion, we provide a thorough investigation of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.
Project description:The tracking of leukemic clones in acute myeloid leukemic promisses deeper insights into disease development and therapeutic options. We therefore established a fluorescent genetic barcoding (FGB) labeling approach that allows for flow cytomtric tracking of color-coded clones in vitro and in vivo. In Hoxa9 and Meis1 (H9M) dependent murine AML, we tracked the growth behavior of 24 clones in parallel and enriched for pre-leukemic clones as well as their de novo expanded counterparts and stably expanded clones from leukemic mice by fluorescence-activated cell sorting. These samples were subjected toRNA sequencing for the assessment of transcriptional changes underlying clonal maintenance and expansion.
Project description:Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ~1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when "deep sequencing" genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.