Unknown

Dataset Information

0

Principal component analysis-based filtering improves detection for Affymetrix gene expression arrays.


ABSTRACT: Gene expression array technology has reached the stage of being routinely used to study clinical samples in search of diagnostic and prognostic biomarkers. Due to the nature of array experiments, which examine the expression of tens of thousands of genes simultaneously, the number of null hypotheses is large. Hence, multiple testing correction is often necessary to control the number of false positives. However, multiple testing correction can lead to low statistical power in detecting genes that are truly differentially expressed. Filtering out non-informative genes allows for reduction in the number of null hypotheses. While several filtering methods have been suggested, the appropriate way to perform filtering is still debatable. We propose a new filtering strategy for Affymetrix GeneChips®, based on principal component analysis of probe-level gene expression data. Using a wholly defined spike-in data set and one from a diabetes study, we show that filtering by the proportion of variation accounted for by the first principal component (PVAC) provides increased sensitivity in detecting truly differentially expressed genes while controlling false discoveries. We demonstrate that PVAC exhibits equal or better performance than several widely used filtering methods. Furthermore, a data-driven approach that guides the selection of the filtering threshold value is also proposed.

SUBMITTER: Lu J 

PROVIDER: S-EPMC3141272 | biostudies-literature | 2011 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Principal component analysis-based filtering improves detection for Affymetrix gene expression arrays.

Lu Jun J   Kerns Robnet T RT   Peddada Shyamal D SD   Bushel Pierre R PR  

Nucleic acids research 20110427 13


Gene expression array technology has reached the stage of being routinely used to study clinical samples in search of diagnostic and prognostic biomarkers. Due to the nature of array experiments, which examine the expression of tens of thousands of genes simultaneously, the number of null hypotheses is large. Hence, multiple testing correction is often necessary to control the number of false positives. However, multiple testing correction can lead to low statistical power in detecting genes tha  ...[more]

Similar Datasets

| S-EPMC2216046 | biostudies-literature
| S-EPMC2585104 | biostudies-literature
| S-EPMC5912177 | biostudies-literature
| S-EPMC4928327 | biostudies-literature
| S-EPMC2885369 | biostudies-literature
| S-EPMC2910027 | biostudies-literature
| S-EPMC6118369 | biostudies-literature
| S-EPMC1762343 | biostudies-literature
| S-EPMC1316121 | biostudies-literature
| S-EPMC3176196 | biostudies-literature