ABSTRACT: BACKGROUND:Gene-expression companion diagnostic tests, such as the Oncotype DX test, assess the risk of early stage Estrogen receptor (ER) positive (+) breast cancers, and guide clinicians in the decision of whether or not to use chemotherapy. However, these tests are typically expensive, time consuming, and tissue-destructive. METHODS:In this paper, we evaluate the ability of computer-extracted nuclear morphology features from routine hematoxylin and eosin (H&E) stained images of 178 early stage ER+ breast cancer patients to predict corresponding risk categories derived using the Oncotype DX test. A total of 216 features corresponding to the nuclear shape and architecture categories from each of the pathologic images were extracted and four feature selection schemes: Ranksum, Principal Component Analysis with Variable Importance on Projection (PCA-VIP), Maximum-Relevance, Minimum Redundancy Mutual Information Difference (MRMR MID), and Maximum-Relevance, Minimum Redundancy - Mutual Information Quotient (MRMR MIQ), were employed to identify the most discriminating features. These features were employed to train 4 machine learning classifiers: Random Forest, Neural Network, Support Vector Machine, and Linear Discriminant Analysis, via 3-fold cross validation. RESULTS:The four sets of risk categories, and the top Area Under the receiver operating characteristic Curve (AUC) machine classifier performances were: 1) Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High) (AUC?=?0.83), 2) Low ODx vs. High ODx (AUC?=?0.72), 3) Low ODx vs. Intermediate and High ODx (AUC?=?0.58), and 4) Low and Intermediate ODx vs. High ODx (AUC?=?0.65). Trained models were tested independent validation set of 53 cases which comprised of Low and High ODx risk, and demonstrated per-patient accuracies ranging from 75 to 86%. CONCLUSION:Our results suggest that computerized image analysis of digitized H&E pathology images of early stage ER+ breast cancer might be able predict the corresponding Oncotype DX risk categories.