Unknown

Dataset Information

0

A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis.


ABSTRACT: In this article we address the problem of phylogenetic inference from nucleic acid data containing missing bases. We introduce a new effective approach, called "Probabilistic estimation of missing values" (PEMV), allowing one to estimate unknown nucleotides prior to computing the evolutionary distances between them. We show that the new method improves the accuracy of phylogenetic inference compared to the existing methods "Ignoring Missing Sites" (IMS), "Proportional Distribution of Missing and Ambiguous Bases" (PDMAB) included in the PAUP software [26]. The proposed strategy for estimating missing nucleotides is based on probabilistic formulae developed in the framework of the Jukes-Cantor [10] and Kimura 2-parameter [11] models. The relative performances of the new method were assessed through simulations carried out with the SeqGen program [20], for data generation, and the Bio NJ method [7], for inferring phylogenies. We also compared the new method to the DNAML program [5] and "Matrix Representation using Parsimony" (MRP) [13], [19] considering an example of 66 eutherian mammals originally analyzed in [17].

SUBMITTER: Diallo AB 

PROVIDER: S-EPMC2674658 | biostudies-literature | 2007 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis.

Diallo Abdoulaye Baniré AB   Lapointe François-Joseph FJ   Makarenkov Vladimir V  

Evolutionary bioinformatics online 20070201


In this article we address the problem of phylogenetic inference from nucleic acid data containing missing bases. We introduce a new effective approach, called "Probabilistic estimation of missing values" (PEMV), allowing one to estimate unknown nucleotides prior to computing the evolutionary distances between them. We show that the new method improves the accuracy of phylogenetic inference compared to the existing methods "Ignoring Missing Sites" (IMS), "Proportional Distribution of Missing and  ...[more]

Similar Datasets

| S-EPMC5469784 | biostudies-literature
| S-EPMC4915111 | biostudies-literature
| S-EPMC5920143 | biostudies-literature
| S-EPMC5548942 | biostudies-other
| S-EPMC6153696 | biostudies-literature
| S-EPMC6030081 | biostudies-other
| S-EPMC2943516 | biostudies-literature
| S-EPMC4071774 | biostudies-literature
| S-EPMC4165576 | biostudies-literature
| S-EPMC5582138 | biostudies-literature