Multimodal hierarchical classification allows for efficient annotation of CITE-seq data
Ontology highlight
ABSTRACT: Single-cell RNA sequencing (scRNA-seq) is an invaluable tool for profiling cells in complex tissues and dissecting activation states that lack well-defined surface protein expression. For immune cells, the transcriptomic profile captured by scRNA- seq cannot always identify cell states and subsets defined by conventional flow cytometry. Emerging technologies have enabled multimodal sequencing of single cells, such as paired sequencing of the transcriptome and surface proteome by CITE-seq, but integrating these high dimensional modalities for accurate cell type annotation remains a challenge in the field. Here, we describe a machine learning tool called MultiModal Classifier Hierarchy (MMoCHi) for the cell-type annotation of CITE-seq data. Our classifier involves several steps: 1) we use landmark registration to remove batch-related staining artifacts in CITE-Seq protein expression, 2) the user defines a hierarchy of classifications based on cell type similarity and ontology and provides markers (protein or gene expression) for the identification of ground truth populations within the dataset by threshold gating, 3) progressing through this user-defined hierarchy, we train a random forest classifier using all available modalities (surface proteome and transcriptome data), and 4) we use these forests to predict cell types across the entire dataset. Applying MMoCHi to CITE-seq data of immune cells isolated from eight distinct tissue sites of two human organ donors yields high-purity cell type annotations encompassing the broad array of immune cell states in the dataset. This includes T and B cell memory subsets, macrophages and monocytes, and natural killer cells, as well as rare populations of plasmacytoid dendritic cells, innate T cells, and innate lymphoid cell subsets. We validate the use of feature importances extracted from the classifier hierarchy to select robust genes for improved identification of T cell memory subsets by scRNA-seq. Together, MMoCHi provides a comprehensive system of tools for the batch-correction and cell- type annotation of CITE-seq data. Moreover, this tool provides flexibility in classification hierarchy design allowing for cell type annotations to reflect a researcher’s specific experimental design. This flexibility also renders MMoCHi readily extendable beyond immune cell annotation, and potentially adaptable to other sequencing modalities.
ORGANISM(S): Homo sapiens
PROVIDER: GSE229791 | GEO | 2023/08/08
REPOSITORIES: GEO
ACCESS DATA