Unknown

Dataset Information

0

A graphical model method for integrating multiple sources of genome-scale data.


ABSTRACT: Making effective use of multiple data sources is a major challenge in modern bioinformatics. Genome-wide data such as measures of transcription factor binding, gene expression, and sequence conservation, which are used to identify binding regions and genes that are important to major biological processes such as development and disease, can be difficult to use together due to the different biological meanings and statistical distributions of the heterogeneous data types, but each can provide valuable information for understanding the processes under study. Here we present methods for integrating multiple data sources to gain a more complete picture of gene regulation and expression. Our goal is to identify genes and cis-regulatory regions which play specific biological roles. We describe a graphical mixture model approach for data integration, examine the effect of using different model topologies, and discuss methods for evaluating the effectiveness of the models. Model fitting is computationally efficient and produces results which have clear biological and statistical interpretations. The Hedgehog and Dorsal signaling pathways in Drosophila, which are critical in embryonic development, are used as examples.

SUBMITTER: Dvorkin D 

PROVIDER: S-EPMC4867227 | biostudies-literature | 2013 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

A graphical model method for integrating multiple sources of genome-scale data.

Dvorkin Daniel D   Biehs Brian B   Kechris Katerina K  

Statistical applications in genetics and molecular biology 20130801 4


Making effective use of multiple data sources is a major challenge in modern bioinformatics. Genome-wide data such as measures of transcription factor binding, gene expression, and sequence conservation, which are used to identify binding regions and genes that are important to major biological processes such as development and disease, can be difficult to use together due to the different biological meanings and statistical distributions of the heterogeneous data types, but each can provide val  ...[more]

Similar Datasets

| S-EPMC3865369 | biostudies-literature
| S-EPMC3123338 | biostudies-literature
| S-EPMC3691143 | biostudies-literature
| S-EPMC3584913 | biostudies-literature
| S-EPMC3236839 | biostudies-literature
| S-EPMC7125090 | biostudies-literature
| S-EPMC2694152 | biostudies-literature
| S-EPMC6546887 | biostudies-literature
| S-EPMC2648781 | biostudies-literature
| S-EPMC2847756 | biostudies-literature