Unknown

Dataset Information

0

FLASH: fast length adjustment of short reads to improve genome assemblies.


ABSTRACT: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash.t.magoc@gmail.com.

SUBMITTER: Magoc T 

PROVIDER: S-EPMC3198573 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

FLASH: fast length adjustment of short reads to improve genome assemblies.

Magoč Tanja T   Salzberg Steven L SL  

Bioinformatics (Oxford, England) 20110907 21


<h4>Motivation</h4>Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.<h4>Results</h4>We present FLASH, a fast computational tool to extend the length of short reads  ...[more]

Similar Datasets

| S-EPMC7506068 | biostudies-literature
| S-EPMC5870704 | biostudies-literature
| S-EPMC5481147 | biostudies-literature
| S-EPMC3113943 | biostudies-literature
| S-EPMC4699355 | biostudies-literature
| S-EPMC3424124 | biostudies-literature
| S-EPMC8092372 | biostudies-literature
| S-EPMC2884544 | biostudies-literature
| S-EPMC3527383 | biostudies-literature