Unknown

Dataset Information

0

A novel model for protein sequence similarity analysis based on spectral radius.


ABSTRACT: Advances in sequencing technologies led to rapid increase in the number and diversity of biological sequences, which facilitated development in the sequence research. In this paper, we present a new method for analyzing protein sequence similarity. We calculated the spectral radii of 20 amino acids (AAs) and put forward a novel 2-D graphical representation of protein sequences. To characterize protein sequences numerically, three groups of features were extracted and related to statistical, dynamics measurements and fluctuation complexity of the sequences. With the obtained feature vector, two models utilizing Gaussian Kernel similarity and Cosine similarity were built to measure the similarity between sequences. We applied our method to analyze the similarities/dissimilarities of four data sets. Both proposed models received consistent results with improvements when compared to that obtained by the ClustalW analysis. The novel approach we present in this study may therefore benefit protein research in medical and scientific fields.

SUBMITTER: Wu C 

PROVIDER: S-EPMC7094169 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

A novel model for protein sequence similarity analysis based on spectral radius.

Wu Chuanyan C   Gao Rui R   De Marinis Yang Y   Zhang Yusen Y  

Journal of theoretical biology 20180307


Advances in sequencing technologies led to rapid increase in the number and diversity of biological sequences, which facilitated development in the sequence research. In this paper, we present a new method for analyzing protein sequence similarity. We calculated the spectral radii of 20 amino acids (AAs) and put forward a novel 2-D graphical representation of protein sequences. To characterize protein sequences numerically, three groups of features were extracted and related to statistical, dyna  ...[more]

Similar Datasets

| S-EPMC1131888 | biostudies-literature
| S-EPMC3204935 | biostudies-literature
| S-EPMC8855713 | biostudies-literature
| S-EPMC6454479 | biostudies-literature
| S-EPMC2760442 | biostudies-literature
| S-EPMC7660437 | biostudies-literature
| S-EPMC2935381 | biostudies-literature
| S-EPMC5054945 | biostudies-literature
| S-EPMC6530227 | biostudies-literature
| S-EPMC5974305 | biostudies-literature