Dataset Information

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.

ABSTRACT:

Background

Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques.

Methods

We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers.

Results

In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified.

Conclusions

It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.

SUBMITTER: Zhang Z

PROVIDER: S-EPMC8386074 | biostudies-literature | 2021 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.

Zhang Zishuang Z Liu Zhi-Ping ZP

BMC medical genomics 20210825 Suppl 1

<h4>Background</h4>Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identifica ...[more]

PMID: 34433487

Dataset Information

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.

Background

Methods

Results

Conclusions

Publications

Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.
| S-EPMC4937038 | biostudies-literature

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data.
| S-EPMC8165452 | biostudies-literature

A critical assessment of feature selection methods for biomarker discovery in clinical proteomics.
| S-EPMC3536906 | biostudies-literature

Biomarker-driven drug repurposing for NAFLD-associated hepatocellular carcinoma using machine learning integrated ensemble feature selection.
| S-EPMC12043677 | biostudies-literature

Biomarker discovery in inflammatory bowel diseases using network-based feature selection.
| S-EPMC6874333 | biostudies-literature

Hepatocellular Carcinoma (HCC) biomarker discovery cohort for small RNA profiling
2022-06-01 | E-MTAB-8528 | biostudies-arrayexpress

Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data.
| S-EPMC6532608 | biostudies-literature

A biomarker discovery of acute myocardial infarction using feature selection and machine learning.
| S-EPMC10191821 | biostudies-literature

Optimizing hybrid ensemble feature selection strategies for transcriptomic biomarker discovery in complex diseases.
| S-EPMC11237901 | biostudies-literature

Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods.
| S-EPMC8233431 | biostudies-literature