Unknown

Dataset Information

0

Clustering co-abundant genes identifies components of the gut microbiome that are reproducibly associated with colorectal cancer and inflammatory bowel disease.


ABSTRACT: BACKGROUND:Whole-genome "shotgun" (WGS) metagenomic sequencing is an increasingly widely used tool for analyzing the metagenomic content of microbiome samples. While WGS data contains gene-level information, it can be challenging to analyze the millions of microbial genes which are typically found in microbiome experiments. To mitigate the ultrahigh dimensionality challenge of gene-level metagenomics, it has been proposed to cluster genes by co-abundance to form Co-Abundant Gene groups (CAGs). However, exhaustive co-abundance clustering of millions of microbial genes across thousands of biological samples has previously been intractable purely due to the computational challenge of performing trillions of pairwise comparisons. RESULTS:Here we present a novel computational approach to the analysis of WGS datasets in which microbial gene groups are the fundamental unit of analysis. We use the Approximate Nearest Neighbor heuristic for near-exhaustive average linkage clustering to group millions of genes by co-abundance. This results in thousands of high-quality CAGs representing complete and partial microbial genomes. We applied this method to publicly available WGS microbiome surveys and found that the resulting microbial CAGs associated with inflammatory bowel disease (IBD) and colorectal cancer (CRC) were highly reproducible and could be validated independently using multiple independent cohorts. CONCLUSIONS:This powerful approach to gene-level metagenomics provides a powerful path forward for identifying the biological links between the microbiome and human health. By proposing a new computational approach for handling high dimensional metagenomics data, we identified specific microbial gene groups that are associated with disease that can be used to identify strains of interest for further preclinical and mechanistic experimentation.

SUBMITTER: Minot SS 

PROVIDER: S-EPMC6670193 | biostudies-literature | 2019 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Clustering co-abundant genes identifies components of the gut microbiome that are reproducibly associated with colorectal cancer and inflammatory bowel disease.

Minot Samuel S SS   Willis Amy D AD  

Microbiome 20190801 1


<h4>Background</h4>Whole-genome "shotgun" (WGS) metagenomic sequencing is an increasingly widely used tool for analyzing the metagenomic content of microbiome samples. While WGS data contains gene-level information, it can be challenging to analyze the millions of microbial genes which are typically found in microbiome experiments. To mitigate the ultrahigh dimensionality challenge of gene-level metagenomics, it has been proposed to cluster genes by co-abundance to form Co-Abundant Gene groups (  ...[more]

Similar Datasets

| S-EPMC6577315 | biostudies-literature
| S-EPMC3957428 | biostudies-literature
2013-12-05 | E-GEOD-46761 | biostudies-arrayexpress
2013-12-05 | GSE46761 | GEO
| S-EPMC6495231 | biostudies-literature
| S-EPMC8780135 | biostudies-literature
| S-EPMC6096775 | biostudies-literature
| S-EPMC6311932 | biostudies-literature
| S-EPMC6131705 | biostudies-literature
| S-EPMC6342642 | biostudies-literature