ABSTRACT: Although alcohol risk is heritable, few genetic risk variants have been identified. Longitudinal electronic health record (EHR) data offer a largely untapped source of phenotypic information for genetic studies, but EHR-derived phenotypes for harmful alcohol exposure have yet to be validated. Using a variant of known effect, we used EHR data to develop and validate a phenotype for harmful alcohol exposure that can be used to identify unknown genetic variants in large samples. Herein, we consider the validity of 3 approaches using the 3-item Alcohol Use Disorders Identification Test consumption measure (AUDIT-C) as a phenotype for harmful alcohol exposure.First, using longitudinal AUDIT-C data from the Veterans Aging Cohort Biomarker Study Cohort (VACS-BC), we compared 3 metrics of AUDIT-C using correlation coefficients: (i) AUDIT-C closest to blood sampling (closest AUDIT-C), (ii) the highest value (highest AUDIT-C), (iii) and longitudinal trajectories generated using joint trajectory modeling (AUDIT-C trajectory). Second, we compared the associations of the 3 AUDIT-C metrics with phosphatidylethanol (PEth), a direct, quantitative biomarker for alcohol in the overall sample using chi-square tests for trend. Last, in the subsample of African Americans (AAs; n = 1,503), we compared the associations of the 3 AUDIT-C metrics with rs2066702 a common missense (Arg369Cys) polymorphism of the ADH1B gene, which encodes an alcohol dehydrogenase isozyme.The sample (n = 1,851, 94.5% male, 65% HIV+, mean age 52 years) had a median of 7 AUDIT-C scores over a median of 6.1 years. Highest AUDIT-C and AUDIT-C trajectory were correlated r = 0.86. The closest AUDIT-C was obtained a median of 2.26 years after the VACS-BC blood draw. Overall and among AAs, all 3 AUDIT-C metrics were associated with PEth (all p < 0.05), but the gradient was steepest with AUDIT-C trajectory. Among AAs (36% with the protective ADH1B allele), the association of rs2066702 with AUDIT-C trajectory and highest AUDIT-C was statistically significant (p < 0.05), and the gradient was steeper for the AUDIT-C trajectory than for the highest AUDIT-C. The closest AUDIT-C was not statistically significantly associated with rs2066702.EHR data can be used to identify complex phenotypes such as harmful alcohol use. The validity of the phenotype may be enhanced through the use of longitudinal trajectories.