Dataset Information

CrossLink: a novel method for cross-condition classification of cancer subtypes.

ABSTRACT: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another.To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature.We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles.A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

SUBMITTER: Ma C

PROVIDER: S-EPMC5001207 | biostudies-literature | 2016 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

CrossLink: a novel method for cross-condition classification of cancer subtypes.

Ma Chifeng C Sastry Konduru S KS Flore Mario M Gehani Salah S Al-Bozom Issam I Feng Yusheng Y Serpedin Erchin E Chouchane Lotfi L Chen Yidong Y Huang Yufei Y

BMC genomics 20160822

<h4>Background</h4>We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, ...[more]

PMID: 27556419

Similar Datasets

Project description:BackgroundMolecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity. However, most significant gene markers are unstable (not reproducible) among data sets. We introduce a standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes. This extends standard gene expression arrays with new pathway-level activation features obtained directly from off-the-shelf gene set enrichment algorithms such as GSEA. Such so-called pathway-based expression arrays are significantly more reproducible across datasets. Such reproducibility will be important for clinical usefulness of genomic markers, and augment currently accepted cancer classification protocols.ResultsThe present method produced more stable (reproducible) pathway-based markers for discriminating breast cancer metastasis and ovarian cancer survival time. Between two datasets for breast cancer metastasis, the intersection of standard significant gene biomarkers totaled 7.47% of selected genes, compared to 17.65% using pathway-based markers; the corresponding percentages for ovarian cancer datasets were 20.65% and 33.33% respectively. Three pathways, consisting of Type_1_diabetes mellitus, Cytokine-cytokine_receptor_interaction and Hedgehog_signaling (all previously implicated in cancer), are enriched in both the ovarian long survival and breast non-metastasis groups. In addition, integrating pathway and gene information, we identified five (ID4, ANXA4, CXCL9, MYLK, FBXL7) and six (SQLE, E2F1, PTTG1, TSTA3, BUB1B, MAD2L1) known cancer genes significant for ovarian and breast cancer respectively.ConclusionsStandardizing the analysis of genomic data in the process of cancer staging, classification and analysis is important as it has implications for both pre-clinical as well as clinical studies. The paradigm of diagnosis and prediction using pathway-based biomarkers as features can be an important part of the process of biomarker-based cancer analysis, and the resulting canonical (clinically reproducible) biomarkers can be important in standardizing genomic data. We expect that identification of such canonical biomarkers will improve clinical utility of high-throughput datasets for diagnostic and prognostic applications.

Project description:BackgroundBreast invasive carcinoma (BRCA) is not a single disease as each subtype has a distinct morphology structure. Although several computational methods have been proposed to conduct breast cancer subtype identification, the specific interaction mechanisms of genes involved in the subtypes are still incomplete. To identify and explore the corresponding interaction mechanisms of genes for each subtype of breast cancer can impose an important impact on the personalized treatment for different patients.MethodsWe integrate the biological importance of genes from the gene regulatory networks to the differential expression analysis and then obtain the weighted differentially expressed genes (weighted DEGs). A gene with a high weight means it regulates more target genes and thus holds more biological importance. Besides, we constructed gene coexpression networks for control and experiment groups, and the significantly differentially interacting structures encouraged us to design the corresponding Gene Ontology (GO) enrichment based on gene coexpression networks (GOEGCN). The GOEGCN considers the two-side distinction analysis between gene coexpression networks for control and experiment groups. The method allows us to study how the modulated coexpressed gene couples impact biological functions at a GO level.ResultsWe modeled the binary classification with weighted DEGs for each subtype. The binary classifier could make a good prediction for an unseen sample, and the experimental results validated the effectiveness of our proposed approaches. The novel enriched GO terms based on GOEGCN for control and experiment groups of each subtype explain the specific biological function changes according to the two-side distinction of coexpression network structures to some extent.ConclusionThe weighted DEGs contain biological importance derived from the gene regulatory network. Based on the weighted DEGs, five binary classifiers were learned and showed good performance concerning the "Sensitivity," "Specificity," "Accuracy," "F1," and "AUC" metrics. The GOEGCN with weighted DEGs for control and experiment groups presented a novel GO enrichment analysis results and the novel enriched GO terms would further unveil the changes of specific biological functions among all the BRCA subtypes to some extent. The R code in this research is available at https://github.com/yxchspring/GOEGCN_BRCA_Subtypes.

Project description:BackgroundNasopharyngeal carcinoma (NPC) treatment is largely based on a 'one-drug-fits-all' strategy in patients with similar pathological characteristics. However, given its biological heterogeneity, patients at the same clinical stage or similar therapies exhibit significant clinical differences. Thus, novel molecular subgroups based on these characteristics may better therapeutic outcomes.MethodsHerein, 192 treatment-naïve NPC samples with corresponding clinicopathological information were obtained from Fujian Cancer Hospital between January 2015 and January 2018. The gene expression profiles of the samples were obtained by RNA sequencing. Molecular subtypes were identified by consensus clustering. External NPC cohorts were used as the validation sets.ResultsPatients with NPC were classified into immune, metabolic, and proliferative molecular subtypes with distinct clinical features. Additionally, this classification was repeatable and predictable as validated by the external NPC cohorts. Metabolomics has shown that arachidonic acid metabolites were associated with NPC malignancy. We also identified several key genes in each subtype using a weighted correlation network analysis. Furthermore, a prognostic risk model based on these key genes was developed and was significantly associated with disease-free survival (hazard ratio, 1.11; 95% CI, 1.07-1.16; P < 0.0001), which was further validated by an external NPC cohort (hazard ratio, 7.71; 95% CI, 1.39-42.73; P < 0.0001). Moreover, the 1-, 3-, and 5-year areas under the curve were 0.84 (95% CI, 0.74-0.94), 0.81 (95% CI, 0.73-0.89), and 0.82 (95% CI, 0.73-0.90), respectively, demonstrating a high predictive value.ConclusionsOverall, we defined a novel classification of nasopharyngeal carcinoma (immune, metabolism, and proliferation subtypes). Among these subtypes, metabolism and proliferation subtypes were associated with advanced stage and poor prognosis of NPC patients, whereas the immune subtype was linked to early stage and favorable prognosis.

Dataset Information

CrossLink: a novel method for cross-condition classification of cancer subtypes.

Publications

CrossLink: a novel method for cross-condition classification of cancer subtypes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets