Unknown

Dataset Information

0

De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads.


ABSTRACT: MOTIVATION:Paired-end sequencing allows circumventing the shortness of the reads produced by second generation sequencers and is essential for de novo assembly of genomes. However, obtaining a finished genome from short reads is still an open challenge. We present an algorithm that exploits the pairing information issued from inserts of potentially any length. The method determines paths through an overlaps graph by using a constrained search tree. We also present a method that automatically determines suited overlaps cutoffs according to the contextual coverage, reducing thus the need for manual parameterization. Finally, we introduce an interactive mode that allows querying an assembly at targeted regions. RESULTS:We assess our methods by assembling two Staphylococcus aureus strains that were sequenced on the Illumina platform. Using 100 bp paired-end reads and minimal manual curation, we produce a finished genome sequence for the previously undescribed isolate SGH-10-168. AVAILABILITY AND IMPLEMENTATION:The presented algorithms are implemented in the standalone Edena software, freely available under the General Public License (GPLv3) at www.genomic.ch/edena.php.

SUBMITTER: Hernandez D 

PROVIDER: S-EPMC6280916 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads.

Hernandez David D   Tewhey Ryan R   Veyrieras Jean-Baptiste JB   Farinelli Laurent L   Østerås Magne M   François Patrice P   Schrenzel Jacques J  

Bioinformatics (Oxford, England) 20131015 1


<h4>Motivation</h4>Paired-end sequencing allows circumventing the shortness of the reads produced by second generation sequencers and is essential for de novo assembly of genomes. However, obtaining a finished genome from short reads is still an open challenge. We present an algorithm that exploits the pairing information issued from inserts of potentially any length. The method determines paths through an overlaps graph by using a constrained search tree. We also present a method that automatic  ...[more]

Similar Datasets

| S-EPMC3158087 | biostudies-literature
| S-EPMC3076424 | biostudies-literature
| S-EPMC3614465 | biostudies-other
| S-EPMC7168855 | biostudies-literature
2010-07-13 | E-GEOD-22765 | biostudies-arrayexpress
2010-07-13 | GSE22765 | GEO
| PRJEB4485 | ENA
| S-EPMC5830760 | biostudies-literature
| S-EPMC10231564 | biostudies-literature
| S-EPMC3919575 | biostudies-literature