Unknown

Dataset Information

0

Quantifying heterogeneity of expression data based on principal components.


ABSTRACT:

Motivation

The diversity of biological omics data provides richness of information, but also presents an analytic challenge. While there has been much methodological and theoretical development on the statistical handling of large volumes of biological data, far less attention has been devoted to characterizing their veracity and variability.

Results

We propose a method of statistically quantifying heterogeneity among multiple groups of datasets, derived from different omics modalities over various experimental and/or disease conditions. It draws upon strategies from analysis of variance and principal component analysis in order to reduce dimensionality of the variability across multiple data groups. The resulting hypothesis-based inference procedure is demonstrated with synthetic and real data from a cell line study of growth factor responsiveness based on a factorial experimental design.

Availability and implementation

Source code and datasets are freely available at https://github.com/yangzi4/gPCA.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Yang Z 

PROVIDER: S-EPMC6378942 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC1501050 | biostudies-literature
| S-EPMC2367470 | biostudies-literature
| S-EPMC7663540 | biostudies-literature
| S-EPMC4408558 | biostudies-literature
| S-EPMC2992445 | biostudies-literature
| S-EPMC4890592 | biostudies-literature
| S-EPMC4583840 | biostudies-literature
| S-EPMC3872138 | biostudies-literature
| S-EPMC3962763 | biostudies-literature
| S-EPMC2435549 | biostudies-literature