ABSTRACT: PURPOSE:Assess the impact of false-positives (FP), false-negatives (FN), fixation losses (FL), and test duration (TD) on visual field (VF) reliability at different stages of glaucoma severity. DESIGN:Retrospective. PARTICIPANTS:A total of 10?262 VFs from 1538 eyes of 909 subjects with suspect or manifest glaucoma and ?5 VF examinations. METHODS:Predicted mean deviation (MD) was calculated with multilevel modeling of longitudinal data. Differences between predicted and observed MD (?MD) were calculated as a reliability measure. The impact of FP, FN, FL, and TD on ?MD was assessed using multilevel modeling. MAIN OUTCOME MEASURES:?MD associated with a 10% increment in FP, FN, and FL, or a 1-minute increase in TD. RESULTS:FL had little impact on ?MD (<0.2 decibels [dB] per 10% abnormal catch trials), and no level of FL produced ?1 dB of ?MD at any disease stage. FP yielded greater than expected MD, with a 10% increment in abnormal catch trials associated with a ?MD = 0.42, 0.73, and 0.66 dB in mild (MD >-6 dB), moderate (-6 ?MD <-12 dB), and severe (-12 ?MD ?-20 dB) disease, respectively, up to 20% abnormal catch trials, and a ?MD = 1.57, 2.06, and 3.53 dB beyond 20% abnormal catch trials. FNs generally produced observed MDs below expected MDs. FN were minimally impactful up to 20% abnormal catch trials (?MD per 10% increment >-0.14 dB at all levels of severity). Beyond 20% abnormal catch trials, each 10% increment in abnormal catch trials was associated with a ?MD = -1.27, -0.53, and -0.51 dB in mild, moderate, and severe disease, respectively. |?MD| ?1 dB occurred with 22% FP and 26% FN in early, 14% FP and 34% FN in moderate, and 16% FP and 51% FN in severe disease. A 1-minute increment in TD produced ?MDs between -0.35 and -0.40 dB. CONCLUSIONS:FL have little impact on reliability in patients with established glaucoma. FP, and to a lesser extent FNs and TD, significantly affect reliability. The impact of FP and FN varies with disease severity and over the range of abnormal catch trials. On the basis of our findings, we present evidence-based, severity-specific standards for classifying VF reliability for clinical or research applications.