Unknown

Dataset Information

0

Spectrum: fast density-aware spectral clustering for single and multi-omic data.


ABSTRACT: Motivation: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data.

Results: We present Spectrum, a new spectral clustering method for complex omic data. Spectrum uses a self-tuning density-aware kernel we developed that enhances the similarity between points that share common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures. Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis. Spectrum can automatically find K for both Gaussian and non-Gaussian structures. We demonstrate across 21 real expression datasets that Spectrum gives improved runtimes and better clustering results relative to other methods.

Availability and implementation: Spectrum is available as an R software package from CRAN https://cran.r-project.org/web/packages/Spectrum/index.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: John CR 

PROVIDER: S-EPMC7703791 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Spectrum: fast density-aware spectral clustering for single and multi-omic data.

John Christopher R CR   Watson David D   Barnes Michael R MR   Pitzalis Costantino C   Lewis Myles J MJ  

Bioinformatics (Oxford, England) 20200201 4


<h4>Motivation</h4>Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore  ...[more]

Similar Datasets

| S-EPMC9272806 | biostudies-literature
| S-EPMC5411077 | biostudies-literature
| S-EPMC10442428 | biostudies-literature
| S-EPMC9805570 | biostudies-literature
| S-EPMC6237755 | biostudies-literature
| S-EPMC9351124 | biostudies-literature
| S-EPMC7954949 | biostudies-literature
| S-EPMC8195153 | biostudies-literature
| S-EPMC7864438 | biostudies-literature
| S-EPMC11015955 | biostudies-literature