Dataset Information

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

ABSTRACT:

Motivation

Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers.

Results

We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.

Availability and implementation

Freely available at http://sourceforge.net/p/bless-ec

Contact

dchen@illinois.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Heo Y

PROVIDER: S-EPMC6365934 | biostudies-literature | 2014 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

Heo Yun Y Wu Xiao-Long XL Chen Deming D Ma Jian J Hwu Wen-Mei WM

Bioinformatics (Oxford, England) 20140121 10

<h4>Motivation</h4>Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers.<h ...[more]

PMID: 24451628

Dataset Information

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Improving transcriptome assembly through error correction of high-throughput sequence reads.
| S-EPMC3728768 | biostudies-literature

Error correction of high-throughput sequencing datasets with non-uniform coverage.
| S-EPMC3117386 | biostudies-literature

MeCorS: Metagenome-enabled error correction of single cell sequencing reads.
| S-EPMC4937190 | biostudies-literature

BLESS 2: accurate, memory-efficient and fast error correction method.
| S-EPMC6280799 | biostudies-literature

LCAT: an isoform-sensitive error correction for transcriptome sequencing long reads
| S-EPMC10245045 | biostudies-literature

Hybrid error correction and de novo assembly of single-molecule sequencing reads.
| S-EPMC3707490 | biostudies-literature

Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.
| S-EPMC11659559 | biostudies-literature

Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly.
| S-EPMC5221426 | biostudies-literature

Identifying micro-inversions using high-throughput sequencing reads.
| S-EPMC4895285 | biostudies-literature

HALC: High throughput algorithm for long read error correction.
| S-EPMC5382505 | biostudies-literature