Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery.
Ontology highlight
ABSTRACT: Accurately predicting patient risk and identifying survival biomarkers are two important tasks in survival analysis. For the emerging high-throughput gene expression data, random survival forest (RSF) is attracting more and more attention as it not only shows excellent performance on survival prediction problems with high-dimensional variables, but also is capable of identifying important variables according to variable importance automatically calculated within the algorithm. However, RSF still suffers from some problems such as limited predictive accuracy on independent datasets and limited biological interpretation of survival biomarkers. In this study, we integrated gene interaction information into a Reweighted RSF model (RRSF) to improve predictive accuracy and identify biologically meaningful survival markers. We applied RRSF to the prediction of patients with glioblastoma multiforme (GBM) and esophageal squamous cell carcinoma (ESCC). With a reconstructed global pathway network and an mRNA-lncRNA co-expression network as the prior gene interaction information, RRSF showed better overall predictive performance than RSF on three GBM and two ESCC datasets. In addition, RRSF identified a two-gene and three-lncRNA signature, which showed robust prognostic values and had high biological relevance to the development of GBM and ESCC, respectively.
SUBMITTER: Wang W
PROVIDER: S-EPMC6123437 | biostudies-literature | 2018 Sep
REPOSITORIES: biostudies-literature
ACCESS DATA