Dataset Information

Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line.

ABSTRACT: Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.

SUBMITTER: Blayney JK

PROVIDER: S-EPMC5041471 | biostudies-literature | 2016 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line.

Blayney Jaine K JK Davison Timothy T McCabe Nuala N Walker Steven S Keating Karen K Delaney Thomas T Greenan Caroline C Williams Alistair R AR McCluggage W Glenn WG Capes-Davis Amanda A Harkin D Paul DP Gourley Charlie C Kennedy Richard D RD

Nucleic acids research 20160628 17

Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for sub ...[more]

PMID: 27353327

Dataset Information

Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line.

Publications

Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Prior Knowledge Transfer Across Transcriptional Datasets Using Compositional Statistics
2016-11-08 | GSE73638 | GEO

Prior Knowledge Transfer Across Transcriptional Datasets Using Compositional Statistics [Tumor]
2016-11-08 | GSE73551 | GEO

Prior Knowledge Transfer Across Transcriptional Datasets Using Compositional Statistics [Cell lines]
2016-11-08 | GSE73637 | GEO

Prior Knowledge Transfer Across Transcriptional Datasets Using Compositional Statistics
| PRJNA297464 | ENA

Ovarian cancer statistics, 2018.
| S-EPMC6621554 | biostudies-literature

Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets.
| S-EPMC8361636 | biostudies-literature

Omnibus and robust deconvolution scheme for bulk RNA sequencing data integrating multiple single-cell reference sets and prior biological knowledge.
| S-EPMC9525013 | biostudies-literature

Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets.
| S-EPMC5345826 | biostudies-literature

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.
| S-EPMC6937257 | biostudies-literature

A molecular prior distribution for Bayesian inference based on Wilson statistics.
| S-EPMC9233040 | biostudies-literature