Dataset Information

Fast and interpretable genomic data analysis using multiple approximate kernel learning.

ABSTRACT:

Motivation

Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices.

Results

To test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines.

Availability and implementation

MAKL is available at https://github.com/begumbektas/makl together with the scripts that replicate the reported experiments. MAKL is also available as an R package at https://cran.r-project.org/web/packages/MAKL.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Bektas AB

PROVIDER: S-EPMC9235505 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast and interpretable genomic data analysis using multiple approximate kernel learning.

Bektaş Ayyüce Begüm AB Ak Çiğdem Ç Gönen Mehmet M

Bioinformatics (Oxford, England) 20220601 Suppl 1

<h4>Motivation</h4>Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that inte ...[more]

PMID: 35758810

Dataset Information

Fast and interpretable genomic data analysis using multiple approximate kernel learning.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Fast and interpretable genomic data analysis using multiple approximate kernel learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Multiple-kernel learning for genomic data mining and prediction.
| S-EPMC6694479 | biostudies-literature

Classifying Breast Cancer Subtypes Using Multiple Kernel Learning Based on Omics Data.
| S-EPMC6471546 | biostudies-literature

COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data.
| S-EPMC10311322 | biostudies-literature

Supervised multiple kernel learning approaches for multi-omics data integration.
| S-EPMC11585117 | biostudies-literature

Fast approximate hierarchical clustering using similarity heuristics.
| S-EPMC2561018 | biostudies-literature

L2-norm multiple kernel learning and its application to biomedical data fusion.
| S-EPMC2906488 | biostudies-literature

Orthogonalized Kernel Debiased Machine Learning for Multimodal Data Analysis.
| S-EPMC10530774 | biostudies-literature

Fast and interpretable consensus clustering via minipatch learning.
| S-EPMC9560608 | biostudies-literature

PIMKL: Pathway-Induced Multiple Kernel Learning.
| S-EPMC6401099 | biostudies-literature

Fast and accurate read mapping with approximate seeds and multiple backtracking.
| S-EPMC3627565 | biostudies-literature