Dataset Information

Model-based clustering of multi-tissue gene expression data.

ABSTRACT:

Motivation

Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues.

Results

We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals.

Availability and implementation

Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Erola P

PROVIDER: S-EPMC7162352 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Model-based clustering of multi-tissue gene expression data.

Erola Pau P Björkegren Johan L M JLM Michoel Tom T

Bioinformatics (Oxford, England) 20200301 6

<h4>Motivation</h4>Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithi ...[more]

PMID: 31688915

Similar Datasets

Project description:BACKGROUND:Longitudinally collected gene expression data provides an opportunity to investigate the dynamic behavior of gene expression and is crucial for establishing causal links between changes on a molecular level and disease development and progression. In terms of the analysis of such data, clustering of subjects based on time-course expression data may improve our understanding of temporal expression patterns that result in disease phenotypes. Although there are numerous existing methods for clustering subjects using gene expression data, most are not suitable when expression measurements are repeatedly collected over a time-course. METHODS:We present a modified version of the recursively partitioned mixture model (RPMM) for clustering subjects based on longitudinally collected gene expression data. In the proposed time-course RPMM (TC-RPMM), subjects are clustered on the basis of their temporal profiles of gene expression using a mixture of mixed effects models framework. This framework captures changes in gene expression over time and models the autocorrelation between repeated gene expression measurements for the same subject. We assessed the performance of TC-RPMM using extensive simulation studies and a dataset from a multi-center research study of inflammation and response to injury (www.gluegrant.org), which consisted of time-course gene expression data for 140 subjects. RESULTS:Our simulation studies encompassed several different scenarios and were aimed at assessing the ability of TC-RPMM to correctly recover true class memberships when the expression trajectories that characterized those classes differed. Overall, our simulation studies revealed favorable performance of TC-RPMM compared to competing approaches, however clustering performance was observed to be highly dependent on the proportion of class discriminating genes used in clustering analysis. When applied to real epidemiologic data with repeated-measures, longitudinal gene expression measurements, TC-RPMM identified clusters that had strong biological and clinical significance. CONCLUSIONS:Methods for clustering subjects based on temporal gene expression profiles is a high priority for molecular biology and bioinformatics research. Along these lines, the proposed TC-RPMM represents a promising new approach for analyzing time-course gene expression data.

Dataset Information

Model-based clustering of multi-tissue gene expression data.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Model-based clustering of multi-tissue gene expression data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets