Dataset Information

Identification of disease-associated loci using machine learning for genotype and network data integration.

ABSTRACT: MOTIVATION:Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS:We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals' ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user's research needs. AVAILABILITY AND IMPLEMENTATION:An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Leal LG

PROVIDER: S-EPMC6954643 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identification of disease-associated loci using machine learning for genotype and network data integration.

Leal Luis G LG David Alessia A Jarvelin Marjo-Riita MR Sebert Sylvain S Männikkö Minna M Karhunen Ville V Seaby Eleanor E Hoggart Clive C Sternberg Michael J E MJE

Bioinformatics (Oxford, England) 20191201 24

<h4>Motivation</h4>Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genot ...[more]

PMID: 31070705

Dataset Information

Identification of disease-associated loci using machine learning for genotype and network data integration.

Publications

Identification of disease-associated loci using machine learning for genotype and network data integration.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Identification of a glioma functional network from gene fitness data using machine learning.
| S-EPMC8831986 | biostudies-literature

Integration of pan-cancer multi-omics data for novel mixed subgroup identification using machine learning methods.
| S-EPMC10586677 | biostudies-literature

Drug repositioning: a machine-learning approach through data integration.
| S-EPMC3704944 | biostudies-literature

Machine learning for data integration in human gut microbiome.
| S-EPMC9685977 | biostudies-literature

Rapid identification of wood species using XRF and neural network machine learning.
| S-EPMC8413463 | biostudies-literature

Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer.
| S-EPMC9239907 | biostudies-literature

Identification of the human DPR core promoter element using machine learning
2020-06-04 | GSE139635 | GEO

Identification of postoperative complications using electronic health record data and machine learning.
| S-EPMC7183252 | biostudies-literature

Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data.
| S-EPMC9572112 | biostudies-literature

Using machine learning to improve anaphylaxis case identification in medical claims data.
| S-EPMC10611436 | biostudies-literature