Unknown

Dataset Information

0

Alignment-free genomic sequence comparison using FCGR and signal processing.


ABSTRACT: BACKGROUND:Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing "nearby" species in a reference data set, or for constructing phylogenetic trees. RESULTS:We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then reduce dimension, first using a Fourier trig transform, followed by a Singular Values Decomposition (SVD). This gives vectors of modest length. These in turn are used for fast sequence lookup, construction of phylogenetic trees, and classification of virus genomic data. We illustrate the accuracy and scalability of this approach on several benchmark test sets. CONCLUSIONS:The tandem of FCGR and dimension reductions using Fourier-type transforms and SVD provides a powerful approach for alignment-free genomic comparison. Results compare favorably and often surpass best results reported in prior literature. Good scalability is also observed.

SUBMITTER: Lichtblau D 

PROVIDER: S-EPMC6937637 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Alignment-free genomic sequence comparison using FCGR and signal processing.

Lichtblau Daniel D  

BMC bioinformatics 20191230 1


<h4>Background</h4>Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing "nearby" species in a reference data set, or for constructing phylogenetic trees.<h4>Results</h4>We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then re  ...[more]

Similar Datasets

| S-EPMC4230918 | biostudies-literature
| S-EPMC4080745 | biostudies-literature
| S-EPMC3799466 | biostudies-literature
| S-EPMC6377666 | biostudies-literature
| S-EPMC6659240 | biostudies-literature
| S-EPMC5786891 | biostudies-literature
| S-EPMC3123933 | biostudies-literature
| S-EPMC2818754 | biostudies-literature
| S-EPMC5627421 | biostudies-literature
| S-EPMC3704055 | biostudies-literature