Unknown

Dataset Information

0

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets.


ABSTRACT: Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning 'unassigned' labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

SUBMITTER: Zhao X 

PROVIDER: S-EPMC7947964 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets.

Zhao Xinlei X   Wu Shuang S   Fang Nan N   Sun Xiao X   Fan Jue J  

Briefings in bioinformatics 20200901 5


Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have  ...[more]

Similar Datasets

| S-EPMC8921632 | biostudies-literature
| S-EPMC11897155 | biostudies-literature
2019-11-27 | GSE128982 | GEO
| S-EPMC7500689 | biostudies-literature
| S-EPMC9575221 | biostudies-literature
| S-EPMC10638755 | biostudies-literature
| S-EPMC8187165 | biostudies-literature
| S-EPMC6720041 | biostudies-literature
| S-EPMC5596896 | biostudies-literature
| S-EPMC7279618 | biostudies-literature