Unknown

Dataset Information

0

A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits.


ABSTRACT: BACKGROUND:High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. RESULTS:In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. CONCLUSIONS:The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.

SUBMITTER: Yan KK 

PROVIDER: S-EPMC6389230 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits.

Yan Kang K KK   Zhao Hongyu H   Pang Herbert H  

BMC bioinformatics 20171206 1


<h4>Background</h4>High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of  ...[more]

Similar Datasets

| S-EPMC6471546 | biostudies-literature
| S-EPMC8384175 | biostudies-literature
| S-EPMC5834629 | biostudies-literature
| S-EPMC6773870 | biostudies-literature
| S-EPMC4959391 | biostudies-literature
| S-EPMC8664198 | biostudies-literature
| S-EPMC7464481 | biostudies-literature
| S-EPMC3283887 | biostudies-other
| S-EPMC3427351 | biostudies-literature