Unknown

Dataset Information

0

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls.


ABSTRACT: We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed by the identified SNPs to classify BC cases and controls. Our approach achieves mean average precision (mAP) of 72.66, 67.24 and 69.25 in discriminating BC cases and controls in KBCP, OBCS and merged KBCP-OBCS sample sets, respectively. These results are better than the mAP of 70.08, 63.61 and 66.41 obtained by using a polygenic risk score model derived from 51 known BC-associated SNPs, respectively, in KBCP, OBCS and merged KBCP-OBCS sample sets. BC subtype analysis further reveals that the 200 identified KBCP SNPs from the proposed method performs favorably in classifying estrogen receptor positive (ER+) and negative (ER-) BC cases both in KBCP and OBCS data. Further, a biological analysis of the identified SNPs reveals genes related to important BC-related mechanisms, estrogen metabolism and apoptosis.

SUBMITTER: Behravan H 

PROVIDER: S-EPMC6120908 | biostudies-literature | 2018 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls.

Behravan Hamid H   Hartikainen Jaana M JM   Tengström Maria M   Pylkäs Katri K   Winqvist Robert R   Kosma Veli-Matti VM   Mannermaa Arto A  

Scientific reports 20180903 1


We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed  ...[more]

Similar Datasets

| S-EPMC2806404 | biostudies-literature
| S-EPMC7654779 | biostudies-literature
| S-EPMC9373844 | biostudies-literature
2022-07-27 | GSE209804 | GEO
2022-05-20 | GSE203423 | GEO
| S-EPMC3628758 | biostudies-literature
2021-01-14 | GSE164788 | GEO
| S-EPMC7684976 | biostudies-literature
| S-EPMC4955649 | biostudies-other
| S-EPMC4054890 | biostudies-other