Unknown

Dataset Information

0

Fast-SG: an alignment-free algorithm for hybrid assembly.


ABSTRACT: Background:Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Results:Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Conclusions:Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.

SUBMITTER: Di Genova A 

PROVIDER: S-EPMC6007556 | biostudies-literature | 2018 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fast-SG: an alignment-free algorithm for hybrid assembly.

Di Genova Alex A   Ruz Gonzalo A GA   Sagot Marie-France MF   Maass Alejandro A  

GigaScience 20180501 5


<h4>Background</h4>Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to pr  ...[more]

Similar Datasets

| S-EPMC3638164 | biostudies-other
| S-EPMC8355039 | biostudies-literature
| S-EPMC3999979 | biostudies-literature
| S-EPMC3934876 | biostudies-literature
| S-EPMC4080745 | biostudies-literature
| S-EPMC7419660 | biostudies-literature
| S-EPMC7859483 | biostudies-literature
| S-EPMC5946935 | biostudies-literature
| S-EPMC6374904 | biostudies-literature
| S-EPMC4395267 | biostudies-literature