Dataset Information

Test set bias affects reproducibility of gene signatures.

ABSTRACT:

Motivation

Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction.

Results

We demonstrate that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments. As an alternative, we examine the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms.

Availability and implementation

The code, data and instructions necessary to reproduce our entire analysis is available at https://github.com/prpatil/testsetbias.

SUBMITTER: Patil P

PROVIDER: S-EPMC4495301 | biostudies-literature | 2015 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Test set bias affects reproducibility of gene signatures.

Patil Prasad P Bachant-Winner Pierre-Olivier PO Haibe-Kains Benjamin B Leek Jeffrey T JT

Bioinformatics (Oxford, England) 20150318 14

<h4>Motivation</h4>Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any c ...[more]

PMID: 25788628

Dataset Information

Test set bias affects reproducibility of gene signatures.

Motivation

Results

Availability and implementation

Publications

Test set bias affects reproducibility of gene signatures.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Size matters: how sample size affects the reproducibility and specificity of gene set analysis.
| S-EPMC6805317 | biostudies-literature

DSigDB: drug signatures database for gene set analysis.
| S-EPMC4668778 | biostudies-literature

Longitudinal linear combination test for gene set analysis.
| S-EPMC6902471 | biostudies-literature

Reproducibility of Gene Expression Signatures in Diffuse Large B-Cell Lymphoma.
| S-EPMC8909016 | biostudies-literature

The Molecular Signatures Database (MSigDB) hallmark gene set collection.
| S-EPMC4707969 | biostudies-literature

Camera: a competitive gene set test accounting for inter-gene correlation.
| S-EPMC3458527 | biostudies-literature