Dataset Information

Next generation sequencing reads comparison with an alignment-free distance.

ABSTRACT: Next Generation Sequencing (NGS) machines extract from a biological sample a large number of short DNA fragments (reads). These reads are then used for several applications, e.g., sequence reconstruction, DNA assembly, gene expression profiling, mutation analysis.We propose a method to evaluate the similarity between reads. This method does not rely on the alignment of the reads and it is based on the distance between the frequencies of their substrings of fixed dimensions (k-mers). We compare this alignment-free distance with the similarity measures derived from two alignment methods: Needleman-Wunsch and Blast. The comparison is based on a simple assumption: the most correct distance is obtained by knowing in advance the reference sequence. Therefore, we first align the reads on the original DNA sequence, compute the overlap between the aligned reads, and use this overlap as an ideal distance. We then verify how the alignment-free and the alignment-based distances reproduce this ideal distance. The ability of correctly reproducing the ideal distance is evaluated over samples of read pairs from Saccharomyces cerevisiae, Escherichia coli, and Homo sapiens. The comparison is based on the correctness of threshold predictors cross-validated over different samples.We exhibit experimental evidence that the proposed alignment-free distance is a potentially useful read-to-read distance measure and performs better than the more time consuming distances based on alignment.Alignment-free distances may be used effectively for reads comparison, and may provide a significant speed-up in several processes based on NGS sequencing (e.g., DNA assembly, reads classification).

SUBMITTER: Weitschek E

PROVIDER: S-EPMC4265526 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Next generation sequencing reads comparison with an alignment-free distance.

Weitschek Emanuel E Santoni Daniele D Fiscon Giulia G De Cola Maria Cristina MC Bertolazzi Paola P Felici Giovanni G

BMC research notes 20141203

<h4>Background</h4>Next Generation Sequencing (NGS) machines extract from a biological sample a large number of short DNA fragments (reads). These reads are then used for several applications, e.g., sequence reconstruction, DNA assembly, gene expression profiling, mutation analysis.<h4>Methods</h4>We propose a method to evaluate the similarity between reads. This method does not rely on the alignment of the reads and it is based on the distance between the frequencies of their substrings of fixe ...[more]

PMID: 25465386

Dataset Information

Next generation sequencing reads comparison with an alignment-free distance.

Publications

Next generation sequencing reads comparison with an alignment-free distance.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Alignment-free sequence comparison based on next-generation sequencing reads.
| S-EPMC3581251 | biostudies-literature

GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads.
| S-EPMC10584998 | biostudies-literature

Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.
| S-EPMC4168702 | biostudies-literature

New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.
| S-EPMC4017329 | biostudies-literature

Comparison of sequence reads obtained from three next-generation sequencing platforms.
| S-EPMC3096631 | biostudies-literature

Reads Binning Improves Alignment-Free Metagenome Comparison.
| S-EPMC6881972 | biostudies-literature

AdapterRemoval: easy cleaning of next-generation sequencing reads.
| S-EPMC3532080 | biostudies-literature

FastProNGS: fast preprocessing of next-generation sequencing reads.
| S-EPMC6580563 | biostudies-literature

An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data.
| S-EPMC4501066 | biostudies-literature

A survey of sequence alignment algorithms for next-generation sequencing.
| S-EPMC2943993 | biostudies-literature