Dataset Information

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.

ABSTRACT:

Background

Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of 104 to 105 for each chromosome.

Results

By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds.

Availability and implementation

Software and sample data are available as an R package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

SUBMITTER: Ambroise C

PROVIDER: S-EPMC6857244 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.

Ambroise Christophe C Dehman Alia A Neuvial Pierre P Rigaill Guillem G Vialaneix Nathalie N

Algorithms for molecular biology : AMB 20191115

<h4>Background</h4>Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practic ...[more]

PMID: 31807137

Dataset Information

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.

Background

Results

Availability and implementation

Publications

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Fast approximate hierarchical clustering using similarity heuristics.
| S-EPMC2561018 | biostudies-literature

Data integration by fuzzy similarity-based hierarchical clustering.
| S-EPMC7446192 | biostudies-literature

Spectral clustering based on learning similarity matrix.
| S-EPMC6454479 | biostudies-literature

Band-based similarity indices for gene expression classification and clustering.
| S-EPMC8566472 | biostudies-literature

AnatomiCuts: Hierarchical clustering of tractography streamlines based on anatomical similarity.
| S-EPMC6152885 | biostudies-literature

SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization.
| S-EPMC8345837 | biostudies-literature

Similarity maps and hierarchical clustering for annotating FT-IR spectral images.
| S-EPMC4225570 | biostudies-literature

CHAI: Consensus Clustering Through Similarity Matrix Integration for Cell-Type Identification.
| S-EPMC10983883 | biostudies-literature

CHAI: consensus clustering through similarity matrix integration for cell-type identification.
| S-EPMC11359802 | biostudies-literature

On the Adjacency Matrix of RyR2 Cluster Structures.
| S-EPMC4636394 | biostudies-literature