Dataset Information

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

ABSTRACT: Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform's error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.

SUBMITTER: Firtina C

PROVIDER: S-EPMC6265270 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Firtina Can C Bar-Joseph Ziv Z Alkan Can C Cicek A Ercument AE

Nucleic acids research 20181101 21

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms tha ...[more]

PMID: 30124947

Dataset Information

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Publications

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.
| S-EPMC6028576 | biostudies-literature

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.
| S-EPMC6923905 | biostudies-literature

Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads.
| S-EPMC6966875 | biostudies-literature

A comparative evaluation of hybrid error correction methods for error-prone long reads.
| S-EPMC6362602 | biostudies-literature

Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly.
| S-EPMC7792008 | biostudies-literature

EC: an efficient error correction algorithm for short reads.
| S-EPMC4674864 | biostudies-literature

The draft genome of MD-2 pineapple using hybrid error correction of long reads.
| S-EPMC5066169 | biostudies-literature

NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads.
| S-EPMC4403973 | biostudies-literature

Hybrid-hybrid correction of errors in long reads with HERO.
| S-EPMC10690975 | biostudies-literature

Hybrid error correction and de novo assembly of single-molecule sequencing reads.
| S-EPMC3707490 | biostudies-literature