Unknown

Dataset Information

0

AICM: A Genuine Framework for Correcting Inconsistency Between Large Pharmacogenomics Datasets.


ABSTRACT: The inconsistency of open pharmacogenomics datasets produced by different studies limits the usage of such datasets in many tasks, such as biomarker discovery. Investigation of multiple pharmacogenomics datasets confirmed that the pairwise sensitivity data correlation between drugs, or rows, across different studies (drug-wise) is relatively low, while the pairwise sensitivity data correlation between cell-lines, or columns, across different studies (cell-wise) is considerably strong. This common interesting observation across multiple pharmacogenomics datasets suggests the existence of subtle consistency among the different studies (i.e., strong cell-wise correlation). However, significant noises are also shown (i.e., weak drug-wise correlation) and have prevented researchers from comfortably using the data directly. Motivated by this observation, we propose a novel framework for addressing the inconsistency between large-scale pharmacogenomics data sets. Our method can significantly boost the drug-wise correlation and can be easily applied to re-summarized and normalized datasets proposed by others. We also investigate our algorithm based on many different criteria to demonstrate that the corrected datasets are not only consistent, but also biologically meaningful. Eventually, we propose to extend our main algorithm into a framework, so that in the future when more datasets become publicly available, our framework can hopefully offer a "ground-truth" guidance for references.

SUBMITTER: Hu ZT 

PROVIDER: S-EPMC6417811 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

AICM: A Genuine Framework for Correcting Inconsistency Between Large Pharmacogenomics Datasets.

Hu Zhiyue Tom ZT   Ye Yuting Y   Newbury Patrick A PA   Huang Haiyan H   Chen Bin B  

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 20190101


The inconsistency of open pharmacogenomics datasets produced by different studies limits the usage of such datasets in many tasks, such as biomarker discovery. Investigation of multiple pharmacogenomics datasets confirmed that the pairwise sensitivity data correlation between drugs, or rows, across different studies (drug-wise) is relatively low, while the pairwise sensitivity data correlation between cell-lines, or columns, across different studies (cell-wise) is considerably strong. This commo  ...[more]

Similar Datasets

| S-EPMC4237165 | biostudies-literature
| S-EPMC3700978 | biostudies-literature
| S-EPMC5580432 | biostudies-literature
| S-EPMC4224478 | biostudies-literature
| S-EPMC5975674 | biostudies-literature
| S-EPMC8448528 | biostudies-literature
| S-EPMC8693048 | biostudies-literature
| S-EPMC6777421 | biostudies-literature
| S-EPMC8659367 | biostudies-literature
| S-EPMC5015807 | biostudies-other