Unknown

Dataset Information

0

Methods for automatic reference trees and multilevel phylogenetic placement.


ABSTRACT:

Motivation

In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.

Results

We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.

Availability and implementation

Freely available under GPLv3 at http://github.com/lczech/gappa.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Czech L 

PROVIDER: S-EPMC6449752 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Methods for automatic reference trees and multilevel phylogenetic placement.

Czech Lucas L   Barbera Pierre P   Stamatakis Alexandros A  

Bioinformatics (Oxford, England) 20190401 7


<h4>Motivation</h4>In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is s  ...[more]

Similar Datasets

| S-EPMC7049256 | biostudies-literature
| S-EPMC3935878 | biostudies-literature
| S-EPMC6538146 | biostudies-literature
| S-EPMC7924801 | biostudies-literature
| S-EPMC6623420 | biostudies-literature
| S-EPMC3098090 | biostudies-other
| S-EPMC3813836 | biostudies-other
| S-EPMC5447242 | biostudies-literature
| S-EPMC4947568 | biostudies-literature
| S-EPMC1160516 | biostudies-literature