Imputing missing distances in molecular phylogenetics.
Ontology highlight
ABSTRACT: Missing data are frequently encountered in molecular phylogenetics, but there has been no accurate distance imputation method available for distance-based phylogenetic reconstruction. The general framework for distance imputation is to explore tree space and distance values to find an optimal combination of output tree and imputed distances. Here I develop a least-square method coupled with multivariate optimization to impute multiple missing distance in a distance matrix or from a set of aligned sequences with missing genes so that some sequences share no homologous sites (whose distances therefore need to be imputed). I show that phylogenetic trees can be inferred from distance matrices with about 10% of distances missing, and the accuracy of the resulting phylogenetic tree is almost as good as the tree from full information. The new method has the advantage over a recently published one in that it does not assume a molecular clock and is more accurate (comparable to maximum likelihood method based on simulated sequences). I have implemented the function in DAMBE software, which is freely available at http://dambe.bio.uottawa.ca.
SUBMITTER: Xia X
PROVIDER: S-EPMC6063210 | biostudies-literature | 2018
REPOSITORIES: biostudies-literature
ACCESS DATA