Unknown

Dataset Information

0

A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance.


ABSTRACT: Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in ?18. By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in ?18. The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method.

SUBMITTER: Dong R 

PROVIDER: S-EPMC6465635 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance.

Dong Rui R   He Lily L   He Rong Lucy RL   Yau Stephen S-T SS  

Frontiers in genetics 20190409


Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which r  ...[more]

Similar Datasets

| S-EPMC4958985 | biostudies-literature
| S-EPMC8756197 | biostudies-literature
| S-EPMC2778338 | biostudies-literature
| S-EPMC6047369 | biostudies-literature
| S-EPMC3892691 | biostudies-literature
| S-EPMC6705769 | biostudies-literature
| S-EPMC3443659 | biostudies-literature
| S-EPMC7437474 | biostudies-literature
| S-EPMC2867492 | biostudies-literature
| S-EPMC10752928 | biostudies-literature