Unknown

Dataset Information

0

FastTree: computing large minimum evolution trees with profiles instead of a distance matrix.


ABSTRACT: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N(2)) space and O(N(2)L) time, but FastTree requires just O(NLa + N ) memory and O(N log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

SUBMITTER: Price MN 

PROVIDER: S-EPMC2693737 | biostudies-literature | 2009 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

FastTree: computing large minimum evolution trees with profiles instead of a distance matrix.

Price Morgan N MN   Dehal Paramvir S PS   Arkin Adam P AP  

Molecular biology and evolution 20090417 7


Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor inter  ...[more]

Similar Datasets

| S-EPMC9934975 | biostudies-literature
| S-EPMC6420154 | biostudies-literature
| S-EPMC3141660 | biostudies-literature
| S-EPMC7015965 | biostudies-literature
| S-EPMC7682335 | biostudies-literature
| S-EPMC7141873 | biostudies-literature
| S-EPMC6319031 | biostudies-other
| S-EPMC10501364 | biostudies-literature
| S-EPMC4518679 | biostudies-literature
| S-EPMC3874668 | biostudies-literature