Unknown

Dataset Information

0

An efficient error correction algorithm using FM-index.


ABSTRACT: High-throughput sequencing offers higher throughput and lower cost for sequencing a genome. However, sequencing errors, including mismatches and indels, may be produced during sequencing. Because, errors may reduce the accuracy of subsequent de novo assembly, error correction is necessary prior to assembly. However, existing correction methods still face trade-offs among correction power, accuracy, and speed.We develop a novel overlap-based error correction algorithm using FM-index (called FMOE). FMOE first identifies overlapping reads by aligning a query read simultaneously against multiple reads compressed by FM-index. Subsequently, sequencing errors are corrected by k-mer voting from overlapping reads only. The experimental results indicate that FMOE has highest correction power with comparable accuracy and speed. Our algorithm performs better in long-read than short-read datasets when compared with others. The assembly results indicated different algorithms has its own strength and weakness, whereas FMOE is good for long or good-quality reads.FMOE is freely available at https://github.com/ythuang0522/FMOC .

SUBMITTER: Huang YT 

PROVIDER: S-EPMC5704532 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

An efficient error correction algorithm using FM-index.

Huang Yao-Ting YT   Huang Yu-Wen YW  

BMC bioinformatics 20171128 1


<h4>Background</h4>High-throughput sequencing offers higher throughput and lower cost for sequencing a genome. However, sequencing errors, including mismatches and indels, may be produced during sequencing. Because, errors may reduce the accuracy of subsequent de novo assembly, error correction is necessary prior to assembly. However, existing correction methods still face trade-offs among correction power, accuracy, and speed.<h4>Results</h4>We develop a novel overlap-based error correction alg  ...[more]

Similar Datasets

| S-EPMC4674864 | biostudies-literature
| S-EPMC7580893 | biostudies-literature
| S-EPMC3129260 | biostudies-literature
| S-EPMC5382505 | biostudies-literature
| S-EPMC3169665 | biostudies-literature
| S-EPMC4403973 | biostudies-literature
| S-EPMC4253826 | biostudies-other
| S-EPMC4393065 | biostudies-literature
| S-EPMC4248469 | biostudies-literature
| S-EPMC3382444 | biostudies-literature