Unknown

Dataset Information

0

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.


ABSTRACT:

Unlabelled

Background

Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection.

Results

We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model's EDM and COR are each stronger predictors of model detection success than heritability.

Conclusions

This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.

SUBMITTER: Urbanowicz RJ 

PROVIDER: S-EPMC3549792 | biostudies-literature | 2012 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.

Urbanowicz Ryan J RJ   Kiralis Jeff J   Fisher Jonathan M JM   Moore Jason H JH  

BioData mining 20120926 1


<h4>Unlabelled</h4><h4>Background</h4>Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the d  ...[more]

Similar Datasets

| S-EPMC4094921 | biostudies-literature
| S-EPMC3470561 | biostudies-literature
| S-EPMC7466977 | biostudies-literature
| S-EPMC6805226 | biostudies-literature
| S-EPMC6713450 | biostudies-literature
| S-EPMC5321813 | biostudies-other
| S-EPMC6355851 | biostudies-other
| S-EPMC5637366 | biostudies-literature
| S-EPMC4833017 | biostudies-literature
| PRJNA658988 | ENA