Unknown

Dataset Information

0

An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn's disease patients.


ABSTRACT: Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn's disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.

SUBMITTER: Raimondi D 

PROVIDER: S-EPMC7671306 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

An interpretable low-complexity machine learning framework for robust exome-based <i>in</i>-<i>silico</i> diagnosis of Crohn's disease patients.

Raimondi Daniele D   Simm Jaak J   Arany Adam A   Fariselli Piero P   Cleynen Isabelle I   Moreau Yves Y  

NAR genomics and bioinformatics 20200221 1


Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the <i>genome interpretation</i> puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge  ...[more]

Similar Datasets

| S-EPMC7937228 | biostudies-literature
| S-EPMC10513274 | biostudies-literature
| S-EPMC7818326 | biostudies-literature
| S-EPMC6398390 | biostudies-literature
| S-EPMC10914836 | biostudies-literature
| S-EPMC6300887 | biostudies-other
2024-01-27 | GSE230012 | GEO
| S-EPMC10017874 | biostudies-literature
2024-01-28 | GSE230011 | GEO
2024-01-28 | GSE230010 | GEO