Unknown

Dataset Information

0

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins.


ABSTRACT: Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.

SUBMITTER: Cao W 

PROVIDER: S-EPMC10662474 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins.

Cao Wei W   Wu Lu-Yun LY   Xia Xia-Yu XY   Chen Xiang X   Wang Zhi-Xin ZX   Pan Xian-Ming XM  

Scientific reports 20231120 1


Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between prot  ...[more]

Similar Datasets

| S-EPMC3325999 | biostudies-literature
| S-EPMC3639960 | biostudies-literature
| S-EPMC6502352 | biostudies-literature
| S-EPMC4687978 | biostudies-literature
| S-EPMC9392791 | biostudies-literature
| S-EPMC3429886 | biostudies-literature
| S-EPMC4770315 | biostudies-literature
| S-EPMC4166927 | biostudies-literature
| S-EPMC2775108 | biostudies-literature
| S-EPMC3995811 | biostudies-literature