ABSTRACT: Digital pathology is increasingly prominent in neurodegenerative disease research, but variability in immunohistochemical staining intensity between staining batches prevents large-scale comparative studies. Here we provide a statistically rigorous method to account for staining batch effects in a large sample of brain tissue with frontotemporal lobar degeneration with tau inclusions (FTLD-Tau, N = 39) or TDP-43 inclusions (FTLD-TDP, N = 53). We analyzed the relationship between duplicate measurements of digital pathology, i.e., percent area occupied by pathology (%AO) for grey matter (GM) and white matter (WM), from two distinct staining batches. We found a significant difference in duplicate measurements from distinct staining batches in FTLD-Tau (mean difference: GM = 1.13 ± 0.44, WM = 1.28 ± 0.56; p < 0.001) and FTLD-TDP (GM = 0.95 ± 0.66, WM = 0.90 ± 0.77; p < 0.001), and these measurements were linearly related (R-squared [Rsq]: FTLD-Tau GM = 0.92, WM = 0.92; FTLD-TDP GM = 0.75, WM = 0.78; p < 0.001 all). We therefore used linear regression to transform %AO from distinct staining batches into equivalent values. Using a train-test set design, we examined transformation prerequisites (i.e., Rsq) from linear-modeling in training sets, and we applied equivalence factors (i.e., beta, intercept) to independent testing sets to determine transformation outcomes (i.e., intraclass correlation coefficient [ICC]). First, random iterations (×100) of linear regression showed that smaller training sets (N = 12-24), feasible for prospective use, have acceptable transformation prerequisites (mean Rsq: FTLD-Tau ?0.9; FTLD-TDP ?0.7). When cross-validated on independent complementary testing sets, in FTLD-Tau, N = 12 training sets resulted in 100% of GM and WM transformations with optimal transformation outcomes (ICC ? 0.8), while in FTLD-TDP N = 24 training sets resulted in optimal ICC in testing sets (GM = 72%, WM = 98%). We therefore propose training sets of N = 12 in FTLD-Tau and N = 24 in FTLD-TDP for prospective transformations. Finally, the transformation enabled us to significantly reduce batch-related difference in duplicate measurements in FTLD-Tau (GM/WM: p < 0.001 both) and FTLD-TDP (GM/WM: p < 0.001 both), and to decrease the necessary sample size estimated in a power analysis in FTLD-Tau (GM:-40%; WM: -34%) and FTLD-TDP (GM: -20%; WM: -30%). Finally, we tested generalizability of our approach using a second, open-source, image analysis platform and found similar results. We concluded that a small sample of tissue stained in duplicate can be used to account for pre-analytical variability such as staining batch effects, thereby improving methods for future studies.