Unknown

Dataset Information

0

KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.


ABSTRACT: The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.

SUBMITTER: Randolph TW 

PROVIDER: S-EPMC6138053 | biostudies-literature | 2018 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.

Randolph Timothy W TW   Zhao Sen S   Copeland Wade W   Hullar Meredith M   Shojaie Ali A  

The annals of applied statistics 20180309 1


The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordin  ...[more]

Similar Datasets

| S-EPMC6380514 | biostudies-literature
| S-EPMC4143805 | biostudies-literature
| S-EPMC3694815 | biostudies-literature
| S-EPMC8070328 | biostudies-literature
| S-EPMC4007772 | biostudies-literature
| S-EPMC3285536 | biostudies-literature
| S-EPMC3232376 | biostudies-literature
| S-EPMC4964314 | biostudies-literature
| S-EPMC9290657 | biostudies-literature
| S-EPMC4570290 | biostudies-literature