Unknown

Dataset Information

0

Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.


ABSTRACT: BACKGROUND: With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a reference genome alignment-based methods are not applicable. However it is still possible to study the evolutionary relationship of unassembled genomes based on NGS data. RESULTS: We present a parameter-free alignment-free method, called Under2, based on variable-length patterns, for the direct comparison of sets of NGS reads. We define a similarity measure using variable-length patterns, as well as reverses and reverse-complements, along with their statistical and syntactical properties. We evaluate several alignment-free statistics on the comparison of NGS reads coming from simulated and real genomes. In almost all simulations our method Under2 outperforms all other statistics. The performance gain becomes more evident when real genomes are used. CONCLUSION: The new alignment-free statistic is highly successful in discriminating related genomes based on NGS reads data. In almost all experiments, it outperforms traditional alignment-free statistics that are based on fixed length patterns.

SUBMITTER: Comin M 

PROVIDER: S-EPMC4168702 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.

Comin Matteo M   Schimd Michele M  

BMC bioinformatics 20140910


<h4>Background</h4>With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a reference genome alignment-based methods are not applicable. However it is still possible to study the evolutionary relationship of unassembled genomes based on NGS data.<h4  ...[more]

Similar Datasets

| S-EPMC4265526 | biostudies-literature
| S-EPMC3581251 | biostudies-literature
| S-EPMC3096631 | biostudies-literature
| S-EPMC3526293 | biostudies-literature
| S-EPMC3726674 | biostudies-literature
| S-EPMC6580563 | biostudies-literature
| S-EPMC3532080 | biostudies-literature
| S-EPMC3031631 | biostudies-other
| S-EPMC4682372 | biostudies-literature
| S-EPMC5120338 | biostudies-literature