Dataset Information

Cola: an R/Bioconductor package for consensus partitioning through a general framework.

ABSTRACT: Classification of high-throughput genomic data is a powerful method to assign samples to subgroups with specific molecular profiles. Consensus partitioning is the most widely applied approach to reveal subgroups by summarizing a consensus classification from a list of individual classifications generated by repeatedly executing clustering on random subsets of the data. It is able to evaluate the stability of the classification. We implemented a new R/Bioconductor package, cola, that provides a general framework for consensus partitioning. With cola, various parameters and methods can be user-defined and easily integrated into different steps of an analysis, e.g., feature selection, sample classification or defining signatures. cola provides a new method named ATC (ability to correlate to other rows) to extract features and recommends spherical k-means clustering (skmeans) for subgroup classification. We show that ATC and skmeans have better performance than other commonly used methods by a comprehensive benchmark on public datasets. We also benchmark key parameters in the consensus partitioning procedure, which helps users to select optimal parameter values. Moreover, cola provides rich functionalities to apply multiple partitioning methods in parallel and directly compare their results, as well as rich visualizations. cola can automate the complete analysis and generates a comprehensive HTML report.

SUBMITTER: Gu Z

PROVIDER: S-EPMC7897501 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

cola: an R/Bioconductor package for consensus partitioning through a general framework.

Gu Zuguang Z Schlesner Matthias M Hübschmann Daniel D

Nucleic acids research 20210201 3

Classification of high-throughput genomic data is a powerful method to assign samples to subgroups with specific molecular profiles. Consensus partitioning is the most widely applied approach to reveal subgroups by summarizing a consensus classification from a list of individual classifications generated by repeatedly executing clustering on random subsets of the data. It is able to evaluate the stability of the classification. We implemented a new R/Bioconductor package, cola, that provides a g ...[more]

PMID: 33275159

Dataset Information

Cola: an R/Bioconductor package for consensus partitioning through a general framework.

Publications

cola: an R/Bioconductor package for consensus partitioning through a general framework.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

NetPathMiner: R/Bioconductor package for network path mining through gene expression.
| S-EPMC4609018 | biostudies-literature

BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem.
| S-EPMC6584971 | biostudies-literature

clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets.
| S-EPMC6138422 | biostudies-literature

BPRMeth: a flexible Bioconductor package for modelling methylation profiles.
| S-EPMC6041802 | biostudies-other

GSAR: Bioconductor package for Gene Set analysis in R.
| S-EPMC5259853 | biostudies-literature

rnaSeqMap: a Bioconductor package for RNA sequencing data exploration.
| S-EPMC3128033 | biostudies-literature

tRanslatome: an R/Bioconductor package to portray translational control.
| S-EPMC3892686 | biostudies-literature

Genetic association testing using the GENESIS R/Bioconductor package.
| S-EPMC7904076 | biostudies-literature

FlowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data.
| S-EPMC2777013 | biostudies-literature

BioNAR: an integrated biological network analysis package in bioconductor.
| S-EPMC10582516 | biostudies-literature