Unknown

Dataset Information

0

A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data.


ABSTRACT: Single-cell RNA sequencing (scRNA-seq) has recently brought new insight into cell differentiation processes and functional variation in cell subtypes from homogeneous cell populations. A lack of prior knowledge makes unsupervised machine learning methods, such as clustering, suitable for analyzing scRNA-seq . However, there are several limitations to overcome, including high dimensionality, clustering result instability, and parameter adjustment complexity. In this study, we propose a method by combining structure entropy and k nearest neighbor to identify cell subpopulations in scRNA-seq data. In contrast to existing clustering methods for identifying cell subtypes, minimized structure entropy results in natural communities without specifying the number of clusters. To investigate the performance of our model, we applied it to eight scRNA-seq datasets and compared our method with three existing methods (nonnegative matrix factorization, single-cell interpretation via multikernel learning, and structural entropy minimization principle). The experimental results showed that our approach achieves, on average, better performance in these datasets compared to the benchmark methods.

SUBMITTER: Zhu X 

PROVIDER: S-EPMC6409843 | biostudies-other | 2019 Jan

REPOSITORIES: biostudies-other

altmetric image

Publications

A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data.

Zhu Xiaoshu X   Li Hong-Dong HD   Xu Yunpei Y   Guo Lilu L   Wu Fang-Xiang FX   Duan Guihua G   Wang Jianxin J  

Genes 20190129 2


Single-cell RNA sequencing (scRNA-seq) has recently brought new insight into cell differentiation processes and functional variation in cell subtypes from homogeneous cell populations. A lack of prior knowledge makes unsupervised machine learning methods, such as clustering, suitable for analyzing scRNA-seq . However, there are several limitations to overcome, including high dimensionality, clustering result instability, and parameter adjustment complexity. In this study, we propose a method by  ...[more]

Similar Datasets

| S-EPMC7671411 | biostudies-literature
| S-EPMC6477982 | biostudies-literature
| S-EPMC5410170 | biostudies-literature
| S-EPMC10597635 | biostudies-literature
| S-EPMC8379521 | biostudies-literature
| S-EPMC7541255 | biostudies-literature
| S-EPMC8574980 | biostudies-literature
| S-EPMC10547911 | biostudies-literature
| S-EPMC9306045 | biostudies-literature
| S-EPMC6134335 | biostudies-literature