Unknown

Dataset Information

0

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease.


ABSTRACT: Understanding the basis of complex trait disease is a fundamental problem in human genetics. The CAGI Crohn's Exome challenges are providing insight into the adequacy of current disease models by requiring participants to identify which of a set of individuals has been diagnosed with the disease, given exome data. For the CAGI4 round, we developed a method that used the genotypes from exome sequencing data only to impute the status of genome wide association studies marker SNPs. We then used the imputed genotypes as input to several machine learning methods that had been trained to predict disease status from marker SNP information. We achieved the best performance using Naïve Bayes and with a consensus machine learning method, obtaining an area under the curve of 0.72, larger than other methods used in CAGI4. We also developed a model that incorporated the contribution from rare missense variants in the exome data, but this performed less well. Future progress is expected to come from the use of whole genome data rather than exomes.

SUBMITTER: Pal LR 

PROVIDER: S-EPMC5576730 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease.

Pal Lipika R LR   Kundu Kunal K   Yin Yizhou Y   Moult John J  

Human mutation 20170628 9


Understanding the basis of complex trait disease is a fundamental problem in human genetics. The CAGI Crohn's Exome challenges are providing insight into the adequacy of current disease models by requiring participants to identify which of a set of individuals has been diagnosed with the disease, given exome data. For the CAGI4 round, we developed a method that used the genotypes from exome sequencing data only to impute the status of genome wide association studies marker SNPs. We then used the  ...[more]

Similar Datasets

| S-EPMC8532338 | biostudies-literature
| S-EPMC5509518 | biostudies-literature
| S-EPMC3470545 | biostudies-literature
| S-EPMC7898451 | biostudies-literature
| S-EPMC10528115 | biostudies-literature
| S-EPMC3695811 | biostudies-other
| S-EPMC4370664 | biostudies-literature
2016-06-15 | E-GEOD-69445 | biostudies-arrayexpress
| S-EPMC5741696 | biostudies-literature
2016-06-15 | GSE69445 | GEO