Unknown

Dataset Information

0

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.


ABSTRACT:

Background

ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding.

Methods

We introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity.

Results

We apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias.

Conclusions

Our results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment.

SUBMITTER: Arabnejad M 

PROVIDER: S-EPMC6215626 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.

Arabnejad M M   Dawkins B A BA   Bush W S WS   White B C BC   Harkness A R AR   McKinney B A BA  

BioData mining 20181103


<h4>Background</h4>ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and phys  ...[more]

Similar Datasets

| S-EPMC7107541 | biostudies-literature
| S-EPMC8644062 | biostudies-literature
| S-EPMC7426018 | biostudies-literature
| S-EPMC5872388 | biostudies-literature
| S-EPMC6354965 | biostudies-literature
| S-EPMC1790724 | biostudies-literature
| S-EPMC9358017 | biostudies-literature
| S-EPMC6323418 | biostudies-literature
| S-EPMC4437677 | biostudies-literature
| S-EPMC3911983 | biostudies-literature