Dataset Information

Biomarker signature identification in "omics" data with multi-class outcome.

ABSTRACT: Biomarker signature identification in "omics" data is a complex challenge that requires specialized feature selection algorithms. The objective of these algorithms is to select the smallest set(s) of molecular quantities that are able to predict a given outcome (target) with maximal predictive performance. This task is even more challenging when the outcome comprises of multiple classes; for example, one may be interested in identifying the genes whose expressions allow discrimination among different types of cancer (nominal outcome) or among different stages of the same cancer, e.g. Stage 1, 2, 3 and 4 of Lung Adenocarcinoma (ordinal outcome). In this work, we consider a particular type of successful feature selection methods, named constraint-based, local causal discovery algorithms. These algorithms depend on performing a series of conditional independence tests. We extend these algorithms for the analysis of problems with continuous predictors and multi-class outcomes, by developing and equipping them with an appropriate conditional independence test procedure for both nominal and ordinal multi-class targets. The test is based on multinomial logistic regression and employs the log-likelihood ratio test for model selection. We present a comparative, experimental evaluation on seven real-world, high-dimensional, gene-expression datasets. Within the scope of our analysis the results indicate that the new conditional independence test allows the identification of smaller and better performing signatures for multi-class outcome datasets, with respect to the current alternatives for performing the independence tests.

SUBMITTER: Lagani V

PROVIDER: S-EPMC3962136 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Biomarker signature identification in "omics" data with multi-class outcome.

Lagani Vincenzo V Kortas George G Tsamardinos Ioannis I

Computational and structural biotechnology journal 20130608

Biomarker signature identification in "omics" data is a complex challenge that requires specialized feature selection algorithms. The objective of these algorithms is to select the smallest set(s) of molecular quantities that are able to predict a given outcome (target) with maximal predictive performance. This task is even more challenging when the outcome comprises of multiple classes; for example, one may be interested in identifying the genes whose expressions allow discrimination among diff ...[more]

PMID: 24688712

Similar Datasets

Project description:High-throughput sequencing methods have brought about a huge change in omics-based biomedical study. Integrating various omics data is possibly useful for identifying some correlations across data modalities, thus improving our understanding of the underlying biological mechanisms and complexity. Nevertheless, most existing graph-based feature extraction methods overlook the complementary information and correlations across modalities. Moreover, these methods tend to treat the features of each omics modality equally, which contradicts current biological principles. To solve these challenges, we introduce a novel approach for integrating multi-omics data termed Multi-Omics hypeRgraph integration nEtwork (MORE). MORE initially constructs a comprehensive hyperedge group by extensively investigating the informative correlations within and across modalities. Subsequently, the multi-omics hypergraph encoding module is employed to learn the enriched omics-specific information. Afterward, the multi-omics self-attention mechanism is then utilized to adaptatively aggregate valuable correlations across modalities for representation learning and making the final prediction. We assess MORE's performance on datasets characterized by message RNA (mRNA) expression, Deoxyribonucleic Acid (DNA) methylation, and microRNA (miRNA) expression for Alzheimer's disease, invasive breast carcinoma, and glioblastoma. The results from three classification tasks highlight the competitive advantage of MORE in contrast with current state-of-the-art (SOTA) methods. Moreover, the results also show that MORE has the capability to identify a greater variety of disease-related biomarkers compared to existing methods, highlighting its advantages in biomedical data mining and interpretation. Overall, MORE can be investigated as a valuable tool for facilitating multi-omics analysis and novel biomarker discovery. Our code and data can be publicly accessed at https://github.com/Wangyuhanxx/MORE.

Project description:BackgroundHigh-dimensional omics data integration has emerged as a prominent avenue within the healthcare industry, presenting substantial potential to improve predictive models. However, the data integration process faces several challenges, including data heterogeneity, priority sequence in which data blocks are prioritized for rendering predictive information contained in multiple blocks, assessing the flow of information from one omics level to the other and multicollinearity.MethodsWe propose the Priority-Elastic net algorithm, a hierarchical regression method extending Priority-Lasso for the binary logistic regression model by incorporating a priority order for blocks of variables while fitting Elastic-net models sequentially for each block. The fitted values from each step are then used as an offset in the subsequent step. Additionally, we considered the adaptive elastic-net penalty within our priority framework to compare the results.ResultsThe Priority-Elastic net and Priority-Adaptive Elastic net algorithms were evaluated on a brain tumor dataset available from The Cancer Genome Atlas (TCGA), accounting for transcriptomics, proteomics, and clinical information measured over two glioma types: Lower-grade glioma (LGG) and glioblastoma (GBM).ConclusionOur findings suggest that the Priority-Elastic net is a highly advantageous choice for a wide range of applications. It offers moderate computational complexity, flexibility in integrating prior knowledge while introducing a hierarchical modeling perspective, and, importantly, improved stability and accuracy in predictions, making it superior to the other methods discussed. This evolution marks a significant step forward in predictive modeling, offering a sophisticated tool for navigating the complexities of multi-omics datasets in pursuit of precision medicine's ultimate goal: personalized treatment optimization based on a comprehensive array of patient-specific data. This framework can be generalized to time-to-event, Cox proportional hazards regression and multicategorical outcomes. A practical implementation of this method is available upon request in R script, complete with an example to facilitate its application.

Dataset Information

Biomarker signature identification in "omics" data with multi-class outcome.

Publications

Biomarker signature identification in "omics" data with multi-class outcome.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets