Dataset Information

A novel model for DNA sequence similarity analysis based on graph theory.

ABSTRACT: Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

SUBMITTER: Qi X

PROVIDER: S-EPMC3204935 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A novel model for DNA sequence similarity analysis based on graph theory.

Qi Xingqin X Wu Qin Q Zhang Yusen Y Fuller Eddie E Zhang Cun-Quan CQ

Evolutionary bioinformatics online 20111004

Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, ...[more]

PMID: 22065497

Dataset Information

A novel model for DNA sequence similarity analysis based on graph theory.

Publications

A novel model for DNA sequence similarity analysis based on graph theory.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A novel model for protein sequence similarity analysis based on spectral radius.
| S-EPMC7094169 | biostudies-literature

Integrating species and interactions into similarity metrics: a graph theory-based approach to understanding community similarity.
| S-EPMC6546078 | biostudies-literature

Graph Theory-Based Sequence Descriptors as Remote Homology Predictors.
| S-EPMC7022958 | biostudies-literature

Using Sequence Similarity Based on CKSNP Features and a Graph Neural Network Model to Identify miRNA-Disease Associations.
| S-EPMC9602123 | biostudies-literature

A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.
| S-EPMC5054945 | biostudies-literature

Self-similarity analysis of eubacteria genome based on weighted graph.
| S-EPMC7094106 | biostudies-literature

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding.
| S-EPMC8493040 | biostudies-literature

DTiGEMS+: drug-target interaction prediction using graph embedding, graph mining, and similarity-based techniques.
| S-EPMC7325230 | biostudies-literature

Cross chromosomal similarity for DNA sequence compression.
| S-EPMC2533061 | biostudies-literature

A tensor-based formulation of hetero-functional graph theory.
| S-EPMC9637230 | biostudies-literature