Unknown

Dataset Information

0

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.


ABSTRACT: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage.In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads.We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x- ?200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics.

SUBMITTER: Al-Okaily AA 

PROVIDER: S-EPMC4779561 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.

Al-Okaily Anas A AA  

BMC genomics 20160305


<h4>Background</h4>Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a sin  ...[more]

Similar Datasets

| S-EPMC7544192 | biostudies-literature
| S-EPMC4120091 | biostudies-literature
| S-EPMC3076424 | biostudies-literature
| S-EPMC2813482 | biostudies-literature
| S-EPMC3158087 | biostudies-literature
| S-EPMC3558281 | biostudies-literature
2022-07-21 | PXD031941 | Pride
| S-EPMC8549298 | biostudies-literature
| S-EPMC9330293 | biostudies-literature
| S-EPMC8306402 | biostudies-literature