Unknown

Dataset Information

0

Bayesian correlated clustering to integrate multiple datasets.


ABSTRACT:

Motivation

The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets.

Results

Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.

SUBMITTER: Kirk P 

PROVIDER: S-EPMC3519452 | biostudies-literature | 2012 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bayesian correlated clustering to integrate multiple datasets.

Kirk Paul P   Griffin Jim E JE   Savage Richard S RS   Ghahramani Zoubin Z   Wild David L DL  

Bioinformatics (Oxford, England) 20121009 24


<h4>Motivation</h4>The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types sim  ...[more]

Similar Datasets

| S-EPMC7500954 | biostudies-literature
| S-EPMC4390817 | biostudies-literature
| S-EPMC7750932 | biostudies-literature
| S-EPMC2855327 | biostudies-literature
| S-EPMC3605602 | biostudies-literature
| S-EPMC3789539 | biostudies-other
| S-EPMC4076058 | biostudies-literature
| S-EPMC6461295 | biostudies-literature
| S-EPMC3140961 | biostudies-literature
| S-EPMC5734272 | biostudies-literature