Dataset Information

EC: an efficient error correction algorithm for short reads.

ABSTRACT:

Background

In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads.

Results

We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities.

Conclusions

Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC), for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also.

Software availability

The implementation is freely available for non-commercial purposes. It can be downloaded from: http://engr.uconn.edu/~rajasek/EC.zip.

SUBMITTER: Saha S

PROVIDER: S-EPMC4674864 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

EC: an efficient error correction algorithm for short reads.

Saha Subrata S Rajasekaran Sanguthevar S

BMC bioinformatics 20151207

<h4>Background</h4>In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the re ...[more]

PMID: 26678663

Dataset Information

EC: an efficient error correction algorithm for short reads.

Background

Results

Conclusions

Software availability

Publications

EC: an efficient error correction algorithm for short reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads.
| S-EPMC4403973 | biostudies-literature

An efficient error correction algorithm using FM-index.
| S-EPMC5704532 | biostudies-literature

HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.
| S-EPMC6028576 | biostudies-literature

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.
| S-EPMC6265270 | biostudies-literature

Instance-based error correction for short reads of disease-associated genes.
| S-EPMC8170817 | biostudies-literature

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.
| S-EPMC4615873 | biostudies-literature

ECHO: a reference-free short-read error correction algorithm.
| S-EPMC3129260 | biostudies-literature

Efficient assembly of nanopore reads via highly accurate and intact error correction.
| S-EPMC7782737 | biostudies-literature

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.
| S-EPMC6923905 | biostudies-literature

Estimation of sequencing error rates in short reads.
| S-EPMC3495688 | biostudies-literature