ABSTRACT: OBJECTIVE:In response to a need for better sepsis diagnostics, several new gene expression classifiers have been recently published, including the 11-gene "Sepsis MetaScore," the "FAIM3-to-PLAC8" ratio, and the Septicyte Lab. We performed a systematic search for publicly available gene expression data in sepsis and tested each gene expression classifier in all included datasets. We also created a public repository of sepsis gene expression data to encourage their future reuse. DATA SOURCES:We searched National Institutes of Health Gene Expression Omnibus and EBI ArrayExpress for human gene expression microarray datasets. We also included the Glue Grant trauma gene expression cohorts. STUDY SELECTION:We selected clinical, time-matched, whole blood studies of sepsis and acute infections as compared to healthy and/or noninfectious inflammation patients. We identified 39 datasets composed of 3,241 samples from 2,604 patients. DATA EXTRACTION:All data were renormalized from raw data, when available, using consistent methods. DATA SYNTHESIS:Mean validation areas under the receiver operating characteristic curve for discriminating septic patients from patients with noninfectious inflammation for the Sepsis MetaScore, the FAIM3-to-PLAC8 ratio, and the Septicyte Lab were 0.82 (range, 0.73-0.89), 0.78 (range, 0.49-0.96), and 0.73 (range, 0.44-0.90), respectively. Paired-sample t tests of validation datasets showed no significant differences in area under the receiver operating characteristic curves. Mean validation area under the receiver operating characteristic curves for discriminating infected patients from healthy controls for the Sepsis MetaScore, FAIM3-to-PLAC8 ratio, and Septicyte Lab were 0.97 (range, 0.85-1.0), 0.94 (range, 0.65-1.0), and 0.71 (range, 0.24-1.0), respectively. There were few significant differences in any diagnostics due to pathogen type. CONCLUSIONS:The three diagnostics do not show significant differences in overall ability to distinguish noninfectious systemic inflammatory response syndrome from sepsis, though the performance in some datasets was low (area under the receiver operating characteristic curve, < 0.7) for the FAIM3-to-PLAC8 ratio and Septicyte Lab. The Septicyte Lab also demonstrated significantly worse performance in discriminating infections as compared to healthy controls. Overall, public gene expression data are a useful tool for benchmarking gene expression diagnostics.