Unknown

Dataset Information

0

A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector.


ABSTRACT: Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequences individually. In our work, a representative of each of three groups of protein sequences is introduced. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A cross-grouping comparison is produced to ensure the singularity of each group. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach.

SUBMITTER: Abd Elwahaab MA 

PROVIDER: S-EPMC6530227 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector.

Abd Elwahaab Marwa A MA   Abo-Elkhier Mervat M MM   Abo El Maaty Moheb I MI  

BioMed research international 20190508


Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequenc  ...[more]

Similar Datasets

| S-EPMC4068907 | biostudies-literature
| S-EPMC10622715 | biostudies-literature
| S-EPMC1976428 | biostudies-literature
| S-EPMC6391537 | biostudies-literature
| S-EPMC4461267 | biostudies-literature
| S-EPMC2760442 | biostudies-literature
| S-EPMC5974305 | biostudies-literature
| S-EPMC2725436 | biostudies-literature
| S-EPMC6893242 | biostudies-literature
| S-EPMC11255384 | biostudies-literature