Unknown

Dataset Information

0

A data integration methodology for systems biology.


ABSTRACT: Different experimental technologies measure different aspects of a system and to differing depth and breadth. High-throughput assays have inherently high false-positive and false-negative rates. Moreover, each technology includes systematic biases of a different nature. These differences make network reconstruction from multiple data sets difficult and error-prone. Additionally, because of the rapid rate of progress in biotechnology, there is usually no curated exemplar data set from which one might estimate data integration parameters. To address these concerns, we have developed data integration methods that can handle multiple data sets differing in statistical power, type, size, and network coverage without requiring a curated training data set. Our methodology is general in purpose and may be applied to integrate data from any existing and future technologies. Here we outline our methods and then demonstrate their performance by applying them to simulated data sets. The results show that these methods select true-positive data elements much more accurately than classical approaches. In an accompanying companion paper, we demonstrate the applicability of our approach to biological data. We have integrated our methodology into a free open source software package named POINTILLIST.

SUBMITTER: Hwang D 

PROVIDER: S-EPMC1297682 | biostudies-literature | 2005 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A data integration methodology for systems biology.

Hwang Daehee D   Rust Alistair G AG   Ramsey Stephen S   Smith Jennifer J JJ   Leslie Deena M DM   Weston Andrea D AD   de Atauri Pedro P   Aitchison John D JD   Hood Leroy L   Siegel Andrew F AF   Bolouri Hamid H  

Proceedings of the National Academy of Sciences of the United States of America 20051121 48


Different experimental technologies measure different aspects of a system and to differing depth and breadth. High-throughput assays have inherently high false-positive and false-negative rates. Moreover, each technology includes systematic biases of a different nature. These differences make network reconstruction from multiple data sets difficult and error-prone. Additionally, because of the rapid rate of progress in biotechnology, there is usually no curated exemplar data set from which one m  ...[more]

Similar Datasets

| S-EPMC1297683 | biostudies-literature
| S-EPMC3424966 | biostudies-literature
| S-EPMC3008707 | biostudies-literature
| S-EPMC2748085 | biostudies-literature
| S-EPMC2912892 | biostudies-literature
| S-EPMC3047436 | biostudies-literature
| S-EPMC6525352 | biostudies-literature
| S-EPMC4539887 | biostudies-literature
| S-EPMC2570191 | biostudies-literature
| S-EPMC2802734 | biostudies-literature