Unknown

Dataset Information

0

Completing bacterial genome assemblies: strategy and performance comparisons.


ABSTRACT: Determining the genomic sequences of microorganisms is the basis and prerequisite for understanding their biology and functional characterization. While the advent of low-cost, extremely high-throughput second-generation sequencing technologies and the parallel development of assembly algorithms have generated rapid and cost-effective genome assemblies, such assemblies are often unfinished, fragmented draft genomes as a result of short read lengths and long repeats present in multiple copies. Third-generation, PacBio sequencing technologies circumvented this problem by greatly increasing read length. Hybrid approaches including ALLPATHS-LG, PacBio corrected reads pipeline, SPAdes, and SSPACE-LongRead, and non-hybrid approaches--hierarchical genome-assembly process (HGAP) and PacBio corrected reads pipeline via self-correction--have therefore been proposed to utilize the PacBio long reads that can span many thousands of bases to facilitate the assembly of complete microbial genomes. However, standardized procedures that aim at evaluating and comparing these approaches are currently insufficient. To address the issue, we herein provide a comprehensive comparison by collecting datasets for the comparative assessment on the above-mentioned five assemblers. In addition to offering explicit and beneficial recommendations to practitioners, this study aims to aid in the design of a paradigm positioned to complete bacterial genome assembly.

SUBMITTER: Liao YC 

PROVIDER: S-EPMC4348652 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Completing bacterial genome assemblies: strategy and performance comparisons.

Liao Yu-Chieh YC   Lin Shu-Hung SH   Lin Hsin-Hung HH  

Scientific reports 20150304


Determining the genomic sequences of microorganisms is the basis and prerequisite for understanding their biology and functional characterization. While the advent of low-cost, extremely high-throughput second-generation sequencing technologies and the parallel development of assembly algorithms have generated rapid and cost-effective genome assemblies, such assemblies are often unfinished, fragmented draft genomes as a result of short read lengths and long repeats present in multiple copies. Th  ...[more]

Similar Datasets

| S-EPMC5695209 | biostudies-literature
| S-EPMC5056350 | biostudies-literature
| S-EPMC5321748 | biostudies-literature
| S-EPMC6737777 | biostudies-literature
| S-EPMC5481147 | biostudies-literature
| S-EPMC99031 | biostudies-literature
| S-EPMC4331810 | biostudies-literature
| PRJEB2351 | ENA
| PRJEB2342 | ENA
| S-EPMC3624806 | biostudies-other