Unknown

Dataset Information

0

WSE, a new sequence distance measure based on word frequencies.


ABSTRACT: In this article, we present a new distance metric, the Weighted Sequence Entropy (WSE), based on the short word composition of biological sequences. As a revision of the classical relative entropy (RE), our metric (1) works equivalently with RE in the case of small k, (2) avoids the degeneracy when some word types are absent in one sequence but not in the other. Experiments on 25 viruses including SARS-CoVs show that our method and RE give exactly the same phylogenetic tree when word length k 3, our method still works and gets convergent phylogenetic topology but the RE gives degenerate results.

SUBMITTER: Wang J 

PROVIDER: S-EPMC7185439 | biostudies-literature | 2008 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

WSE, a new sequence distance measure based on word frequencies.

Wang Jun J   Zheng Xiaoqi X  

Mathematical biosciences 20080612 1


In this article, we present a new distance metric, the Weighted Sequence Entropy (WSE), based on the short word composition of biological sequences. As a revision of the classical relative entropy (RE), our metric (1) works equivalently with RE in the case of small k, (2) avoids the degeneracy when some word types are absent in one sequence but not in the other. Experiments on 25 viruses including SARS-CoVs show that our method and RE give exactly the same phylogenetic tree when word length k <o  ...[more]

Similar Datasets

| S-EPMC4080745 | biostudies-literature
| S-EPMC2880003 | biostudies-literature
| S-EPMC3038458 | biostudies-literature
| PRJEB52191 | ENA
| S-EPMC2478692 | biostudies-literature
| S-EPMC8114813 | biostudies-literature
| S-EPMC5832726 | biostudies-literature
| S-EPMC7303933 | biostudies-literature
| S-EPMC4905274 | biostudies-literature
| S-EPMC2815661 | biostudies-literature