Dataset Information

Accurate self-correction of errors in long reads using de Bruijn graphs.

ABSTRACT:

Motivation

New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.

Results

We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.

Availability and implementation

LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ .

Contact

leena.salmela@cs.helsinki.fi.

SUBMITTER: Salmela L

PROVIDER: S-EPMC5351550 | biostudies-literature | 2017 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Accurate self-correction of errors in long reads using de Bruijn graphs.

Salmela Leena L Walve Riku R Rivals Eric E Ukkonen Esko E

Bioinformatics (Oxford, England) 20170301 6

<h4>Motivation</h4>New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing ...[more]

PMID: 27273673

Dataset Information

Accurate self-correction of errors in long reads using de Bruijn graphs.

Motivation

Results

Availability and implementation

Contact

Publications

Accurate self-correction of errors in long reads using de Bruijn graphs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs.
| S-EPMC10541625 | biostudies-literature

cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs.
| S-EPMC6612831 | biostudies-literature

Assembly of long error-prone reads using de Bruijn graphs.
| S-EPMC5206522 | biostudies-literature

VeChat: correcting errors in long reads using variation graphs.
| S-EPMC9636371 | biostudies-literature

Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.
| S-EPMC8562525 | biostudies-literature

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.
| S-EPMC6122196 | biostudies-literature

Integrating long-range connectivity information into de Bruijn graphs.
| S-EPMC6061703 | biostudies-literature

Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields.
| S-EPMC7491180 | biostudies-literature

Succinct colored de Bruijn graphs.
| S-EPMC5872255 | biostudies-literature

De novo assembly and genotyping of variants using colored de Bruijn graphs.
| S-EPMC3272472 | biostudies-literature