Dataset Information

Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences.

ABSTRACT: Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partition high-throughput sequencing datasets constrain fine-scale resolution descriptions of microbial communities. Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into 'MED nodes', which represent homogeneous operational taxonomic units. By employing Shannon entropy, MED uses only the information-rich nucleotide positions across reads and iteratively partitions large datasets while omitting stochastic variation. When applied to analyses of microbiomes from two deep-sea cryptic sponges Hexadella dedritifera and Hexadella cf. dedritifera, MED resolved a key Gammaproteobacteria cluster into multiple MED nodes that are specific to different sponges, and revealed that these closely related sympatric sponge species maintain distinct microbial communities. MED analysis of a previously published human oral microbiome dataset also revealed that taxa separated by less than 1% sequence variation distributed to distinct niches in the oral cavity. The information theory-guided decomposition process behind the MED algorithm enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision.

SUBMITTER: Eren AM

PROVIDER: S-EPMC4817710 | biostudies-literature | 2015 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences.

Eren A Murat AM Morrison Hilary G HG Lescault Pamela J PJ Reveillaud Julie J Vineis Joseph H JH Sogin Mitchell L ML

The ISME journal 20150317 4

Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partiti ...[more]

PMID: 25325381

Dataset Information

Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences.

Publications

Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SAR minimum entropy autofocusing based on Prewitt operator.
| S-EPMC9916621 | biostudies-literature

Unsupervised statistical clustering of environmental shotgun sequences.
| S-EPMC2765972 | biostudies-literature

Highly efficient decomposition of ammonia using high-entropy alloy catalysts.
| S-EPMC6728353 | biostudies-literature

GibbsCluster: unsupervised clustering and alignment of peptide sequences.
| S-EPMC5570237 | biostudies-literature

Tensor-Decomposition-Based Unsupervised Feature Extraction in Single-Cell Multiomics Data Analysis.
| S-EPMC8468466 | biostudies-literature

Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.
| S-EPMC7763286 | biostudies-literature

Unsupervised decomposition of natural monkey behavior into a sequence of motion motifs.
| S-EPMC11371840 | biostudies-literature

Relative Entropy and Minimum-Variance Pricing Kernel in Asset Pricing Model Evaluation.
| S-EPMC7517259 | biostudies-literature

Spatial Decomposition of Translational Water-Water Correlation Entropy in Binding Pockets.
| S-EPMC4819442 | biostudies-literature

Synthesis and Thermal Decomposition of High-Entropy Layered Rare Earth Hydroxychlorides.
| S-EPMC11013826 | biostudies-literature