Project description:MotivationAccurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality. The presence of contamination or the large deviance of an individual genome from the reference may introduce bias in depth estimation.ResultsHere, we present an algorithm and implementation for estimating both the sequencing depth and error rate from unmapped reads using a uniquely filtered k-mer set. On simulated reads with 20× coverage, the margin of error was less than 0.01%. At 0.01× coverage and the presence of 10-fold contamination, the precision was within 2% for depth and within 10% for error rate.Availability and implementationDOCEST program and database can be downloaded from https://bioinfo.ut.ee/docest/.Supplementary informationSupplementary data are available at Bioinformatics Advances online.
Project description:Liquid biopsy with circulating tumor DNA (ctDNA) profiling by next-generation sequencing holds great promise to revolutionize clinical oncology. It relies on the basis that ctDNA represents the real-time status of the tumor genome which contains information of genetic alterations. Compared to tissue biopsy, liquid biopsy possesses great advantages such as a less demanding procedure, minimal invasion, ease of frequent sampling, and less sampling bias. Next-generation sequencing (NGS) methods have come to a point that both the cost and performance are suitable for clinical diagnosis. Thus, profiling ctDNA by NGS technologies is becoming more and more popular since it can be applied in the whole process of cancer diagnosis and management. Further developments of liquid biopsy ctDNA testing will be beneficial for cancer patients, paving the way for precision medicine. In conclusion, profiling ctDNA with NGS for cancer diagnosis is both biologically sound and technically convenient.
Project description:The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.
Project description:BACKGROUND:Analysis of circulating tumor nucleic acids in plasma of Non-Small Cell Lung Cancer (NSCLC) patients is the most widespread and documented form of "liquid biopsy" and provides real-time information on the molecular profile of the tumor without an invasive tissue biopsy. METHODS:Liquid biopsy analysis was requested by the referral physician in 121 NSCLC patients at diagnosis and was performed using a sensitive Next Generation Sequencing assay. Additionally, a comparative analysis of NSCLC patients at relapse following EGFR Tyrosine Kinase Inhibitor (TKIs) treatment was performed in 50 patients by both the cobas and NGS platforms. RESULTS:At least one mutation was identified in almost 49% of the cases by the NGS approach in NSCLC patients analyzed at diagnosis. In 36 cases with paired tissue available a high concordance of 86.11% was observed for clinically relevant mutations, with a Positive Predictive Value (PPV) of 88.89%. Furthermore, a concordance rate of 82% between cobas and the NGS approach for the EGFR sensitizing mutations (in exons 18, 19, 21) was observed in patients with acquired resistance to EGFR TKIs, while this concordance was 94% for the p.T790M mutation, with NGS being able to detect this mutation in three 3 additional patients. CONCLUSIONS:This study indicates the feasibility of circulating tumor nucleic acids (ctNA) analysis as a tumor biopsy surrogate in clinical practice for NSCLC personalized treatment decision making. The use of new sensitive NGS techniques can reliably detect tumor-derived mutations in liquid biopsy and provide clinically relevant information both before and after targeted treatment in patients with NSCLC. Thus, it could aid physicians in treatment decision making in clinical practice.
Project description:Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.
Project description:Differential presence of exons (DPE) by next generation sequencing (NGS) is a method of interpretation of whole exome sequencing. This method has been proposed to design a predictive and diagnostic algorithm with clinical value in plasma from patients bearing colorectal cancer (CRC). The aim of the present study was to determine a common exonic signature to discriminate between different clinical pictures, such as non-metastatic, metastatic and non-disease (healthy), using a sustainable and novel technology in liquid biopsy.Through DPE analysis, we determined the differences in DNA exon levels circulating in plasma between patients bearing CRC vs. healthy, patients bearing CRC metastasis vs. non-metastatic and patients bearing CRC metastasis vs. healthy comparisons. We identified a set of 510 exons (469 up and 41 down) whose differential presence in plasma allowed us to group and classify between the three cohorts. Random forest classification (machine learning) was performed and an estimated out-of-bag (OOB) error rate of 35.9% was obtained and the predictive model had an accuracy of 75% with a confidence interval (CI) of 56.6-88.5.In conclusion, the DPE analysis allowed us to discriminate between different patho-physiological status such as metastatic, non-metastatic and healthy donors. In addition, this analysis allowed us to obtain very significant values with respect to previous published results, since we increased the number of samples in our study. These results suggest that circulating DNA in patient's plasma may be actively released by cells and may be involved in intercellular communication and, therefore, may play a pivotal role in malignant transformation (genometastasis).
Project description:BACKGROUND: 454 sequencing is currently the method of choice for sequencing of antibody repertoires and libraries containing large numbers (106 to 1012) of different molecules with similar frameworks and variable regions which poses significant challenges for identifying sequencing errors. Identification and correction of sequencing errors in such mixtures is especially important for the exploration of complex maturation pathways and identification of putative germline predecessors of highly somatically mutated antibodies. To quantify and correct errors incorporated in 454 antibody sequencing, we sequenced six antibodies at different known concentrations twice over and compared them with the corresponding known sequences as determined by standard Sanger sequencing. RESULTS: We found that 454 antibody sequencing could lead to approximately 20% incorrect reads due to insertions that were mostly found at shorter homopolymer regions of 2-3 nucleotide length, and less so by insertions, deletions and other variants at random sites. Correction of errors might reduce this population of erroneous reads down to 5-10%. However, there are a certain number of errors accounting for 4-8% of the total reads that could not be corrected unless several repeated sequencing is performed, although this may not be possible for large diverse libraries and repertoires including complete sets of antibodies (antibodyomes). CONCLUSIONS: The experimental test procedure carried out for assessing 454 antibody sequencing errors reveals high (up to 20%) incorrect reads; the errors can be reduced down to 5-10% but not less which suggests the use of caution to avoid false discovery of antibody variants and diversity.
Project description:BACKGROUND: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. RESULTS: We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. CONCLUSIONS: The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.
Project description:Several different nosological classifications have been used over time for vascular malformations (VMs) since clinical and pathological signs are largely overlapping. In a large proportion of cases, VMs are generated by somatic mosaicism in key genes, belonging to a few different molecular pathways. Therefore, molecular characterization may help in the understanding of the biological mechanisms related to the development of pathology. Tissue biopsy is not routinely included in the diagnostic path because of the need for fresh tissue specimens and the risk of bleeding. Bypassing the need for bioptic samples, we took advantage of the possibility of isolating cell-free DNA likely released by the affected tissues, to molecularly characterize 53 patients by cfDNA-NGS liquid biopsy. We found a good match between the identified variant and the clinical presentation. PIK3CA variants were found in 67% of Klippel Trenaunay Syndrome individuals; KRAS variants in 60% of arteriovenous malformations; MET was mutated in 75% of lymphovenous malformations. Our results demonstrate the power of cfDNA-NGS liquid biopsy in VMs clinical classification, diagnosis, and treatment. Indeed, tailored repurposing of pre-existing cancer drugs, such as PIK3CA, KRAS, and MET inhibitors, can be envisaged as adjuvant treatment, in addition to surgery and/or endovascular treatment, in the above-defined VMs categories, respectively.