Dataset Information

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs.

ABSTRACT:

Motivation

Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; therefore, computations can be memory intensive. We present a single-cell Hi-C clustering approach using an approximate nearest neighbors method based on locality-sensitive hashing to reduce the dimensions and the computational resources.

Results

The presented method can process a 10 kb single-cell Hi-C dataset with 2600 cells and needs 40 GB of memory, while competitive approaches are not computable even with 1 TB of memory. It can be shown that the differentiation of the cells by their chromatin folding properties and, therefore, the quality of the clustering of single-cell Hi-C data is advantageous compared to competitive algorithms.

Availability and implementation

The presented clustering algorithm is part of the scHiCExplorer, is available on Github https://github.com/joachimwolff/scHiCExplorer, and as a conda package via the bioconda channel. The approximate nearest neighbors implementation is available via https://github.com/joachimwolff/sparse-neighbors-search and as a conda package via the bioconda channel.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Wolff J

PROVIDER: S-EPMC9502147 | biostudies-literature | 2021 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs.

Wolff Joachim J Backofen Rolf R Grüning Björn B

Bioinformatics (Oxford, England) 20211101 22

<h4>Motivation</h4>Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; ...[more]

PMID: 34021764

Dataset Information

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Differential abundance testing on single-cell data using k-nearest neighbor graphs.
| S-EPMC7617075 | biostudies-literature

Single-cell and Spatial Transcriptomics Clustering with an Optimized Adaptive K-Nearest Neighbor Graph.
| S-EPMC10614787 | biostudies-literature

aKNNO: single-cell and spatial transcriptomics clustering with an optimized adaptive k-nearest neighbor graph.
| S-EPMC11293182 | biostudies-literature

Approximate nearest neighbor graph provides fast and efficient embedding with applications for large-scale biological data.
| S-EPMC11655291 | biostudies-literature

Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing.
| S-EPMC6173621 | biostudies-literature

Fast open modification spectral library searching through approximate nearest neighbor indexing
2021-05-25 | PXD009861 | Pride

Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation.
| S-EPMC6628819 | biostudies-literature

Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes.
| S-EPMC6753955 | biostudies-literature

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.
| S-EPMC8709870 | biostudies-literature

Single-cell gene set scoring with nearest neighbor graph smoothed data (gssnng).
| S-EPMC10599965 | biostudies-literature