Unknown

Dataset Information

0

SNPs selection using support vector regression and genetic algorithms in GWAS.


ABSTRACT:

Introduction

This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence.

Results

The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS.

Conclusions

The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.

SUBMITTER: de Oliveira FC 

PROVIDER: S-EPMC4243330 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Introduction</h4>This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and  ...[more]

Similar Datasets

| S-EPMC1390438 | biostudies-literature
| S-EPMC6744342 | biostudies-literature
| S-EPMC1409801 | biostudies-literature
| S-EPMC3110013 | biostudies-literature
| S-EPMC4106049 | biostudies-other
| S-EPMC2875201 | biostudies-literature
| S-EPMC6865618 | biostudies-literature
| S-EPMC2367560 | biostudies-literature
| S-EPMC3748153 | biostudies-literature
| S-EPMC8159845 | biostudies-literature