Dataset Information

Comparison of small n statistical tests of differential expression applied to microarrays.

ABSTRACT:

Background

DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of various normalization methods for empirical microarray data and of various random error models for simulated data.

Results

Three Empirical Bayes methods (CyberT, BRB, and limma t-statistics) were the most effective statistical tests across simulated and both 2-colour cDNA and Affymetrix experimental data. The CyberT regularized t-statistic in particular was able to maintain expected false positive rates with simulated data showing high variances at low gene intensities, although at the cost of low true positive rates. The Local Pooled Error (LPE) test introduced a bias that lowered false positive rates below theoretically expected values and had lower power relative to the top performers. The standard two-sample t-test and fold change were also found to be sub-optimal for detecting differentially expressed genes. The generalized log transformation was shown to be beneficial in improving results with certain data sets, in particular high variance cDNA data.

Conclusion

Pre-processing of data influences performance and the proper combination of pre-processing and statistical testing is necessary for obtaining the best results. All three Empirical Bayes methods assessed in our study are good choices for statistical tests for small n microarray studies for both Affymetrix and cDNA data. Choice of method for a particular study will depend on software and normalization preferences.

SUBMITTER: Murie C

PROVIDER: S-EPMC2674054 | biostudies-literature | 2009 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparison of small n statistical tests of differential expression applied to microarrays.

Murie Carl C Woody Owen O Lee Anna Y AY Nadon Robert R

BMC bioinformatics 20090203

<h4>Background</h4>DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of va ...[more]

PMID: 19192265

Similar Datasets

Project description:Microarray technology has enabled the measurement of comprehensive transcriptomic information. However, each data entry may reflect trivial individual differences among samples and also contain technical noise. Therefore, the certainty of each observed difference should be confirmed at earlier steps of the analyses, and statistical tests are frequently used for this purpose. Since a microarray measures a huge number of genes and the results are processed simultaneously, concerns regarding problems of multiplicity have been raised to the tests. To deal with these problems, several methodologies have been proposed, making the tests very conservative. Indeed, arbitrary tuning of the test threshold has also been introduced to relax the test conditions. However, the appropriateness of the multiplicity problems as well as the compensation methods has not been confirmed. The appropriateness of the compensation methods was checked by means of coincidence of the premises of the methodologies with the observed characteristics found in real data of two typical platforms of microarray analysis. Normality was observed in within-group data variations, supporting applications of parametric tests. However, genes displayed their own tendencies in the magnitude of variations, and the distributions of P-values were rather complex and varied; these characteristics are inconsistent with the premises of the compensation methodologies. Additionally, the appropriateness of the proposed multiplicities is reconsidered. When we observed differences in the transcriptome, the family-wise error rate should not be considered, since analyses at higher levels would not be influenced by a few false positives among the huge numbers of true information. Likely, concerns for a false discovery rate are not suitable for the point null hypotheses on expression levels, since the rate of true null hypotheses should be rare in contradiction to the premise of the methodology. Although compensation methods have been recommended to deal with the problem of multiplicity, the compensations are actually inappropriate for many of the applications of transcriptome analyses. Compensations are not only unnecessary, but will increase the occurrence of false negative errors, and arbitrary adjustment of the threshold damages the objectivity of the tests. Rather, the results of parametric tests should be evaluated directly. This SuperSeries is composed of the SubSeries listed below.

Dataset Information

Comparison of small n statistical tests of differential expression applied to microarrays.

Background

Results

Conclusion

Publications

Comparison of small n statistical tests of differential expression applied to microarrays.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets