Unknown

Dataset Information

0

SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements.


ABSTRACT: Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp, a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny-aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp-derived metrics can classify samples by their diversity-correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp.

SUBMITTER: Barbera P 

PROVIDER: S-EPMC7756409 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements.

Barbera Pierre P   Czech Lucas L   Lutteropp Sarah S   Stamatakis Alexandros A  

Molecular ecology resources 20201009 1


Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or d  ...[more]

Similar Datasets

| S-EPMC3810850 | biostudies-other
| S-EPMC2654800 | biostudies-other
| S-EPMC123827 | biostudies-literature
| PRJNA784359 | ENA
| PRJNA784358 | ENA
| S-EPMC6427046 | biostudies-literature
| S-EPMC6092336 | biostudies-literature
| S-EPMC3527210 | biostudies-literature
| S-EPMC2859756 | biostudies-literature
| S-EPMC4891138 | biostudies-literature