Unknown

Dataset Information

0

A pathway-based data integration framework for prediction of disease progression.


ABSTRACT:

Motivation

Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety of learning strategies to boost prediction performance based on the use of all available data.

Implementation

We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high.

Results

We use the METABRIC dataset for breast cancer, with prediction of survival at 2000 days from diagnosis. Predictive accuracy is improved by using kernels that exclusively use those genes, as features, which are known members of particular pathways. We show that yet further improvements can be made by using a range of additional kernels based on clinical covariates such as Estrogen Receptor (ER) status. Using this range of measures to improve prediction performance, we show that the test accuracy on new instances is nearly 80%, though predictions are only made on 69.2% of the patient cohort.

Availability

https://github.com/jseoane/FSMKL CONTACT: J.Seoane@bristol.ac.uk

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Seoane JA 

PROVIDER: S-EPMC3957070 | biostudies-literature | 2014 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

A pathway-based data integration framework for prediction of disease progression.

Seoane José A JA   Day Ian N M IN   Gaunt Tom R TR   Campbell Colin C  

Bioinformatics (Oxford, England) 20131024 6


<h4>Motivation</h4>Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety of learning strategies to boost prediction performance based on the use of all available data.<h4>Implementation</h4>We consider data integration via the use of multiple kernel learning supervis  ...[more]

Similar Datasets

| S-EPMC2771581 | biostudies-literature
| S-EPMC3283562 | biostudies-literature
| S-EPMC9750948 | biostudies-literature
| S-EPMC6219086 | biostudies-literature
| S-EPMC7490629 | biostudies-literature
| S-EPMC6330278 | biostudies-literature
| S-EPMC10994553 | biostudies-literature
| S-EPMC4517887 | biostudies-literature
| S-EPMC10320144 | biostudies-literature
| S-EPMC5860498 | biostudies-literature