PS3-14: CREX: Utility of a Computerized Methodology to Identify Health Conditions Using the EMR for GWAS, in the Kaiser Permanente Research Program on Genes, Environment, and Health.
Ontology highlight
ABSTRACT: Background/AimsThe Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort has genotyped data on over 100,000 participants. In order to characterize many health conditions of interest found among respondents for sample-size estimation and GWAS, we sought to extend methodologies that utilized the Electronic Medical Record (EMR) in an automated way to characterize many diseases without the expense and limitation of numerous, complex algorithms specific to individual diseases and conditions.MethodsWe tested a probabilistic approach that scored clinical decisions recorded in the EMR to capture specific diagnostic and treatment domains. We first considered physician diagnosis alone. We assessed sensitivity and specificity against internal registries at the Kaiser Permanente Northern California Division of Research. We also used the methodology to characterize phenotypes for Types 1 and 2 diabetes for GWAS. We tested a single diagnostic domain in a logistic model based on an ICD-9-CM taxonomy and found high sensitivity and specificity when compared with internal registries for breast, lung, colon and prostate cancers; Barrett’s esophagus; HIV; Crohn’s disease; ulcerative colitis, and diabetes. We then assessed a logistic regression model to distinguish among members with Types 1 and 2 diabetes in the genotyped cohort, utilizing the ICD-9 taxonomy, the earliest age at diagnosis available in the EMR or by self-report, and pharmacy prescription utilization of anti-diabetic drugs.ResultsThe model exhibited a sensitivity of 96.3% and specificity of 99.6% for Type 1 diabetes and a sensitivity of 94.0% and specificity of 98.4% for Type 2 diabetes when compared with the gold standard internal diabetes registry. We identified an additional 60 cases of Type 1 diabetes and conducted comparative GWAS.ConclusionsUtilizing diagnostic information in the EMR as independent domains in probabilistic models to accomplish phenotype creation appears to be a reliable approach to facilitate robust characterization for evaluation, analysis and mapping of numerous disease phenotypes. The method is agnostic to input taxonomies as long as the EMR record contains sufficient and reliable atomic detail. It can also be adapted for machine learning given expert user feedback when gold standard databases are not available.
SUBMITTER: Sciortino S
PROVIDER: S-EPMC3788526 | biostudies-other | 2013 Sep
REPOSITORIES: biostudies-other
ACCESS DATA