Unknown

Dataset Information

0

SEQuel: improving the accuracy of genome assemblies.


ABSTRACT: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model.SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly.SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.

SUBMITTER: Ronen R 

PROVIDER: S-EPMC3371851 | biostudies-literature | 2012 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

SEQuel: improving the accuracy of genome assemblies.

Ronen Roy R   Boucher Christina C   Chitsaz Hamidreza H   Pevzner Pavel P  

Bioinformatics (Oxford, England) 20120601 12


<h4>Motivation</h4>Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads i  ...[more]

Similar Datasets

| S-EPMC3814395 | biostudies-literature
| S-EPMC7794651 | biostudies-literature
| S-EPMC7649008 | biostudies-literature
| S-EPMC6678273 | biostudies-literature
| S-EPMC3963961 | biostudies-literature
| S-EPMC7000513 | biostudies-literature
| S-EPMC7782250 | biostudies-literature
| S-EPMC9059177 | biostudies-literature
| S-EPMC9350791 | biostudies-literature
| S-EPMC4726296 | biostudies-literature