De-correlating expression in gene-set analysis.
Ontology highlight
ABSTRACT: MOTIVATION: Group-wise pattern analysis of genes, known as gene-set analysis (GSA), addresses the differential expression pattern of biologically pre-defined gene sets. GSA exhibits high statistical power and has revealed many novel biological processes associated with specific phenotypes. In most cases, however, GSA relies on the invalid assumption that the members of each gene set are sampled independently, which increases false predictions. RESULTS: We propose an algorithm, termed DECO, to remove (or alleviate) the bias caused by the correlation of the expression data in GSAs. This is accomplished through the eigenvalue-decomposition of covariance matrixes and a series of linear transformations of data. In particular, moderate de-correlation methods that truncate or re-scale eigenvalues were proposed for a more reliable analysis. Tests of simulated and real experimental data show that DECO effectively corrects the correlation structure of gene expression and improves the prediction accuracy (specificity and sensitivity) for both gene- and sample-randomizing GSA methods. AVAILABILITY: The MATLAB codes and the tested data sets are available at ftp://deco.nims.re.kr/pub or from the author.
SUBMITTER: Nam D
PROVIDER: S-EPMC2935420 | biostudies-literature | 2010 Sep
REPOSITORIES: biostudies-literature
ACCESS DATA