Dataset Information

Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data.

ABSTRACT:

Background

Based on available biological information, genomic data can often be partitioned into pre-defined sets (e.g. pathways) and subsets within sets. Biologists are often interested in determining whether some pre-defined sets of variables (e.g. genes) are differentially expressed under varying experimental conditions. Several procedures are available in the literature for making such determinations, however, they do not take into account information regarding the subsets within each set. Secondly, variables (e.g. genes) belonging to a set or a subset are potentially correlated, yet such information is often ignored and univariate methods are used. This may result in loss of power and/or inflated false positive rate.

Results

We introduce a multiple testing-based methodology which makes use of available information regarding biologically relevant subsets within each pre-defined set of variables while exploiting the underlying dependence structure among the variables. Using this methodology, a biologist may not only determine whether a set of variables are differentially expressed between two experimental conditions, but may also test whether specific subsets within a significant set are also significant.

Conclusions

The proposed methodology; (a) is easy to implement, (b) does not require inverting potentially singular covariance matrices, and (c) controls the family wise error rate (FWER) at the desired nominal level, (d) is robust to the underlying distribution and covariance structures. Although for simplicity of exposition, the methodology is described for microarray gene expression data, it is also applicable to any high dimensional data, such as the mRNA seq data, CpG methylation data etc.

SUBMITTER: Guo W

PROVIDER: S-EPMC3443674 | biostudies-literature | 2012 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data.

Guo Wenge W Yang Mingan M Xing Chuanhua C Peddada Shyamal D SD

BMC bioinformatics 20120724

<h4>Background</h4>Based on available biological information, genomic data can often be partitioned into pre-defined sets (e.g. pathways) and subsets within sets. Biologists are often interested in determining whether some pre-defined sets of variables (e.g. genes) are differentially expressed under varying experimental conditions. Several procedures are available in the literature for making such determinations, however, they do not take into account information regarding the subsets within eac ...[more]

PMID: 22827252

Dataset Information

Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data.

Background

Results

Conclusions

Publications

Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

High-dimensional genomic data bias correction and data integration using MANCIE.
| S-EPMC4833864 | biostudies-other

A decision-theory approach to interpretable set analysis for high-dimensional data.
| S-EPMC3927844 | biostudies-literature

Discovering a sparse set of pairwise discriminating features in high-dimensional data.
| S-EPMC8599814 | biostudies-literature

Lithographically defined three-dimensional pore-patterned carbon with nitrogen doping for high-performance ultrathin supercapacitor applications.
| S-EPMC4066249 | biostudies-literature

New Analysis Framework Incorporating Mixed Mutual Information and Scalable Bayesian Networks for Multimodal High Dimensional Genomic and Epigenomic Cancer Data.
| S-EPMC7314938 | biostudies-literature

Asymptotic conditional singular value decomposition for high-dimensional genomic data.
| S-EPMC3165001 | biostudies-literature

Shrinkage-based diagonal discriminant analysis and its applications in high-dimensional data.
| S-EPMC2794982 | biostudies-literature

An Information-Based Approach for Mediation Analysis on High-Dimensional Metagenomic Data.
| S-EPMC7083016 | biostudies-literature

High dimensional association detection in large-scale genomic data
2020-11-18 | GSE156074 | GEO

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA.
| S-EPMC4549005 | biostudies-literature