Dataset Information

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.

ABSTRACT:

Background

Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.

Results

In this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and "concatenation" using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases.

Conclusions

Theoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).

SUBMITTER: Molloy EK

PROVIDER: S-EPMC6642500 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.

Molloy Erin K EK Warnow Tandy T

Algorithms for molecular biology : AMB 20190719

<h4>Background</h4>Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.<h4>Results</h4>In this paper, we introduce a div ...[more]

PMID: 31360216

Dataset Information

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.

Background

Results

Conclusions

Publications

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Precision Oncology in Sarcomas: Divide and Conquer.
| S-EPMC7446356 | biostudies-literature

Divide and conquer: enriching environmental sequencing data.
| S-EPMC1952108 | biostudies-literature

Multi-Level-Phase Deep Learning Using Divide-and-Conquer for Scaffolding Safety.
| S-EPMC7177762 | biostudies-literature

A fast divide-and-conquer sparse Cox regression.
| S-EPMC8036003 | biostudies-literature

Divide-and-Conquer Hartree-Fock Calculations on Proteins.
| S-EPMC2853773 | biostudies-literature

A divide-and-conquer approach for genomic prediction in rubber tree using machine learning.
| S-EPMC9605989 | biostudies-literature

Strategies for aggressive T-cell lymphoma: divide and conquer.
| S-EPMC7727519 | biostudies-literature

Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model.
| S-EPMC10691101 | biostudies-literature

Divide and conquer? Size adjustment with allometry and intermediate outcomes.
| S-EPMC5679152 | biostudies-literature

Kart: a divide-and-conquer algorithm for NGS read alignment.
| S-EPMC5860120 | biostudies-literature