Dataset Information

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

ABSTRACT: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures.In this paper we present a family of alignment-free measures, called d (q) -type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets.The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.

SUBMITTER: Comin M

PROVIDER: S-EPMC4989896 | biostudies-literature | 2016 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

Comin Matteo M Schimd Michele M

BMC medical genomics 20160812

<h4>Background</h4>Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms ...[more]

PMID: 27535823

Dataset Information

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

Publications

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Reads Binning Improves Alignment-Free Metagenome Comparison.
| S-EPMC6881972 | biostudies-literature

Next generation sequencing reads comparison with an alignment-free distance.
| S-EPMC4265526 | biostudies-literature

Alignment-free sequence comparison based on next-generation sequencing reads.
| S-EPMC3581251 | biostudies-literature

Fast alignment-free sequence comparison using spaced-word frequencies.
| S-EPMC4080745 | biostudies-literature

Alignment-free genomic sequence comparison using FCGR and signal processing.
| S-EPMC6937637 | biostudies-literature

New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.
| S-EPMC4017329 | biostudies-literature

S-conLSH: alignment-free gapped mapping of noisy long reads.
| S-EPMC7879691 | biostudies-literature

Quality measures for protein alignment benchmarks.
| S-EPMC2853116 | biostudies-literature

GenomeScope: fast reference-free genome profiling from short reads.
| S-EPMC5870704 | biostudies-literature

Re-alignment of the unmapped reads with base quality score.
| S-EPMC4402702 | biostudies-literature