Dataset Information

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

ABSTRACT: Objective:Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ?2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes would improve power for genetic association studies. Methods:The PheProb approach employs unsupervised clustering to separate patients into 2 groups based on diagnosis codes. Subjects are assigned a probability of having the phenotype based on the number of diagnosis codes. This approach was developed using simulated EHR data and tested in a real world EHR cohort. In the latter, we tested the association between low density lipoprotein cholesterol (LDL-C) genetic risk alleles known for association with hyperlipidemia and hyperlipidemia codes (ICD-9 272.x). PheProb and thresholding approaches were compared. Results:Among n?=?1462 subjects in the real world EHR cohort, the threshold-based p-values for association between the genetic risk score (GRS) and hyperlipidemia were 0.126 (?1 code), 0.123 (?2 codes), and 0.142 (?3 codes). The PheProb approach produced the expected significant association between the GRS and hyperlipidemia: p?=?.001. Conclusions:PheProb improves statistical power for association studies relative to standard thresholding approaches by leveraging information about the phenotype in the billing code counts. The PheProb approach has direct applications where efficient approaches are required, such as in Phenome-Wide Association Studies.

SUBMITTER: Sinnott JA

PROVIDER: S-EPMC6915826 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

Sinnott Jennifer A JA Cai Fiona F Yu Sheng S Hejblum Boris P BP Hong Chuan C Kohane Isaac S IS Liao Katherine P KP

Journal of the American Medical Informatics Association : JAMIA 20181001 10

<h4>Objective</h4>Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ≥2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes wo ...[more]

PMID: 29788308

Dataset Information

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

Publications

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Multiethnic genetic association studies improve power for locus discovery.
| S-EPMC2935880 | biostudies-literature

Power calculations for genetic association studies using estimated probability distributions.
| S-EPMC379135 | biostudies-other

Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma.
| S-EPMC3668139 | biostudies-literature

Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes.
| S-EPMC6326114 | biostudies-literature

Combining controls can improve power in two-stage association studies.
| S-EPMC6171163 | biostudies-literature

Homogeneous case subgroups increase power in genetic association studies.
| S-EPMC4795054 | biostudies-literature

Improving power in genetic-association studies via wavelet transformation.
| S-EPMC2759953 | biostudies-literature

Using Family History Data to Improve the Power of Association Studies: Application to Cancer in UK Biobank.
| S-EPMC11867219 | biostudies-literature

Controlling for background genetic effects using polygenic scores improves the power of genome-wide association studies.
| S-EPMC8486788 | biostudies-literature

Using public control genotype data to increase power and decrease cost of case-control genetic association studies.
| S-EPMC3133924 | biostudies-literature