Dataset Information

Model Comparison for Breast Cancer Prognosis Based on Clinical Data.

ABSTRACT: We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.

SUBMITTER: Boughorbel S

PROVIDER: S-EPMC4714871 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Model Comparison for Breast Cancer Prognosis Based on Clinical Data.

Boughorbel Sabri S Al-Ali Rashid R Elkum Naser N

PloS one 20160115 1

We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), N ...[more]

PMID: 26771838

Similar Datasets

Project description:BackgroundBreast cancer (BC) is the most common malignancy among women in the world. Alternative splicing (AS) is an important mechanism for regulating gene expression and producing proteome diversity, which is closely related to tumorigenesis. Understanding the role of AS in BC may be helpful to reveal new therapeutic targets for clinical interventions.MethodsRNA-seq, clinical and AS data of TCGA-BRCA were downloaded from TCGA and TCGA SpliceSeq databases. AS events associated with prognosis were filtered by univariate Cox regression. The AS risk model of BC was built by Lasso regression, random forest and multivariate Cox regression. The accuracy of the AS risk model and clinicopathological factors were evaluated by time-dependent receiver operating characteristic (ROC) curves. The significant factors were used to construct the nomogram model. Tumor microenvironment analysis, immune infiltration and immune checkpoint analysis were performed to show the differences between the high and low AS risk groups. The expression differences of genes of AS events constituting the risk model in tumor tissues and normal tissues were analyzed, the genes with significant differences were screened, and their relationship with prognosis, tumor microenvironment, immune infiltration and immune checkpoint were analyzed. Finally, Pearson correlation analysis was used to calculate the correlation coefficient between splicing factors (SF) and prognostic AS events in TCGA-BRCA. The results were imported into Cytoscape, and the associated network was constructed.ResultsA total of 21,232 genes had 45,421 AS events occurring in TCGA-BRCA, while 1604 AS events were found to be significantly correlated with survival. The BRCA risk model consisted of 5 AS events, (TTC39C|44853|AT*- 2.67) + (HSPBP1|52052|AP*- 4.28) + (MAZ|35942|ES*2.34) + (ANK3|11845|AP*1.18) + (ZC3HAV1|81940|AT*1.59), which were confirmed to be valuable for predicting BRCA prognosis to a certain degree, including ROC curve, survival analysis, tumor microenvironment analysis, immune infiltration and immune checkpoint analysis. Based on this, we constructed a nomogram prediction model composed of clinicopathological features and the AS risk signature. Furthermore, we found that MAZ was a core gene indicating the connection of tumor prognosis and AS events. Ultimately, a network of SF-AS regulation was established to reveal the relationship between them.ConclusionsWe constructed a nomogram model combined with clinicopathological features and AS risk score to predict the prognosis of BC. The detailed analysis of tumor microenvironment and immune infiltration in the AS risk model may further reveal the potential mechanisms of BC recurrence and development.

Dataset Information

Model Comparison for Breast Cancer Prognosis Based on Clinical Data.

Publications

Model Comparison for Breast Cancer Prognosis Based on Clinical Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets