Dataset Information

Discovering transcriptional modules by Bayesian data integration.

ABSTRACT:

Motivation

We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets.

Results

We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs.

Availability

If interested in the code for the work presented in this article, please contact the authors.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Savage RS

PROVIDER: S-EPMC2881394 | biostudies-literature | 2010 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Discovering transcriptional modules by Bayesian data integration.

Savage Richard S RS Ghahramani Zoubin Z Griffin Jim E JE de la Cruz Bernard J BJ Wild David L DL

Bioinformatics (Oxford, England) 20100601 12

<h4>Motivation</h4>We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the su ...[more]

PMID: 20529901

Similar Datasets

Project description:We present an exploratory method for discovering likely misconceptions from multiple-choice concept test data, as well as preliminary evidence that this method recovers known misconceptions from real student responses. Our procedure is based on a Bayesian implementation of the Multidimensional Nominal Categories IRT model (MNCM) combined with standard factor-analytic rotation methods; by analyzing student responses at the level of individual distractors rather than at the level of entire questions, this approach is able to highlight multiple likely misconceptions for subsequent investigation without requiring any manual labeling of test content. We explore the performance of the Bayesian MNCM on synthetic data and find that it is able to recover multidimensional item parameters consistently at achievable sample sizes. These studies demonstrate the method's robustness to overfitting and ability to perform automatic dimensionality assessment and selection. The method also compares favorably to existing IRT software implementing marginal maximum likelihood estimation which we use as a validation benchmark. We then apply our method to approximately 10,000 students' responses to a research-designed concept test: the Force Concept Inventory. In addition to a broad first dimension strongly correlated with overall test score, we discover thirteen additional dimensions which load on smaller sets of distractors; we discuss two as examples, showing that these are consistent with already-known misconceptions in Newtonian mechanics. While work remains to validate our findings, our hope is that future applications of this method could aid in the refinement of existing concept inventories or the development of new ones, enable the discovery of previously-unknown student misconceptions across a variety of disciplines, and-by leveraging the method's ability to quantify the prevalence of particular misconceptions-provide opportunities for targeted instruction at both the individual and classroom level.

Dataset Information

Discovering transcriptional modules by Bayesian data integration.

Motivation

Results

Availability

Supplementary information

Publications

Discovering transcriptional modules by Bayesian data integration.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets