Unknown

Dataset Information

0

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.


ABSTRACT: Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class.We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced.dmb.iasi.cnr.it/camur.phpemanuel@iasi.cnr.itSupplementary data are available at Bioinformatics online.

SUBMITTER: Cestarelli V 

PROVIDER: S-EPMC4795614 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

Cestarelli Valerio V   Fiscon Giulia G   Felici Giovanni G   Bertolazzi Paola P   Weitschek Emanuel E  

Bioinformatics (Oxford, England) 20151030 5


<h4>Motivation</h4>Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge  ...[more]

Similar Datasets

| S-EPMC8536868 | biostudies-literature
| S-EPMC6203208 | biostudies-literature
| S-EPMC10959666 | biostudies-literature
| S-EPMC4148917 | biostudies-literature
| S-EPMC5568128 | biostudies-literature
| S-EPMC7297975 | biostudies-literature
| S-EPMC4818202 | biostudies-other
| S-EPMC4758103 | biostudies-literature
| S-EPMC5633036 | biostudies-literature
| S-EPMC10135911 | biostudies-literature