ABSTRACT: Objective: To propose modifications to refine prognostication over anatomic extent of the current tumor, node, and metastasis (TNM) staging system of non-small cell lung cancer (NSCLC) for a better distinction, and reflect survival differences of lung adenocarcinoma and squamous cell carcinoma. Study Design: Three large cohorts were included in this study. The training cohort consisted of 124,788 patients in the Surveillance, Epidemiology, and End Results (SEER) database (2006-2015). The validation cohort consisted of 4,247 patients from the Zhongshan Hospital, Fudan University (FDZSH; 2005-2014), and People's Hospital, Peking University (PKUPH; 2000-2017). The algorithm generated a hierarchical clustering model based on the unsupervised learning for survival data using Kaplan-Meier curves and log-rank test statistics for recursive partitioning and selection of the principal groupings. Results: In the modified staging system, adenocarcinoma cases are usually at a lower stage than the squamous cell carcinoma cases of the same TNM, reflecting a better outcome of adenocarcinoma than that of squamous cell carcinoma. The C-index of the modified staging system was significantly superior to that of the staging system [SEER cohort: 0.722, 95% CI, (0.721-0.723) vs. 0.643, 95% CI, (0.640-0.647); FDZSH cohort: 0.720, 95% CI, (0.709-0.731) vs. 0.519, 95% CI, (0.450-0.586); and PKUPH cohort: 0.730, 95% CI, (0.705-0.735) vs. 0.728, 95% CI, (0.703-0.753)]. Conclusion: Survival differences between lung adenocarcinoma and squamous cell carcinoma have been reflected accurately and reliably in the modified staging system based on the machine learning. It may refine prognostication over anatomic extent.