Dataset Information

A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

ABSTRACT: Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

SUBMITTER: Tabe-Bordbar S

PROVIDER: S-EPMC5920056 | biostudies-literature | 2018 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

Tabe-Bordbar Shayan S Emad Amin A Zhao Sihai Dave SD Sinha Saurabh S

Scientific reports 20180426 1

Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyon ...[more]

PMID: 29700343

Dataset Information

A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

Publications

A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Estimating misclassification error: a closer look at cross-validation based methods.
| S-EPMC3556102 | biostudies-literature

Assessing regulatory information in developmental gene regulatory networks.
| S-EPMC5468647 | biostudies-literature

Cross-validation pitfalls when selecting and assessing regression and classification models.
| S-EPMC3994246 | biostudies-other

Kinome Profiling of Regulatory T Cells: A Closer Look into a Complex Intracellular Network.
| S-EPMC4755507 | biostudies-literature

Thermally gated liposomes: a closer look.
| S-EPMC2765578 | biostudies-literature

A closer look into NADPH oxidase inhibitors: Validation and insight into their mechanism of action.
| S-EPMC7042484 | biostudies-literature

A closer look at 'Cheap White' cigarettes.
| S-EPMC5036225 | biostudies-literature

A closer look at the apparent correlation of structural and functional connectivity in excitable neural networks.
| S-EPMC4297952 | biostudies-literature

A Closer Look at Anandamide Interaction With TRPV1.
| S-EPMC7385410 | biostudies-literature