Project description:Medullary thymic epithelial cells (mTECs) are critical for self-tolerance induction in T cells via promiscuous expression of tissue-specific antigens (TSAs), which are controlled by the transcriptional regulator, AIRE. Whereas AIRE-expressing (Aire+) mTECs undergo constant turnover in the adult thymus, mechanisms underlying differentiation of postnatal mTECs remain to be discovered. Integrative analysis of single-cell assays for transposase-accessible chromatin (scATAC-seq) and single-cell RNA sequencing (scRNA-seq) suggested the presence of proliferating mTECs with a specific chromatin structure, which express high levels of Aire and co-stimulatory molecules, CD80 (Aire+CD80hi). Proliferating Aire+CD80hi mTECs detected using Fucci technology express a minimal number of Aire-dependent TSAs and are converted into quiescent Aire+CD80hi mTECs expressing high levels of TSAs after a transit amplification. These data provide evidence for the existence of transit-amplifying Aire+mTEC precursors during the Aire+mTEC differentiation process of the postnatal thymus.
Project description:MotivationscATAC-seq has enabled chromatin accessibility landscape profiling at the single-cell level, providing opportunities for determining cell-type-specific regulation codes. However, high dimension, extreme sparsity, and large scale of scATAC-seq data have posed great challenges to cell-type identification. Thus, there has been a growing interest in leveraging the well-annotated scRNA-seq data to help annotate scATAC-seq data. However, substantial computational obstacles remain to transfer information from scRNA-seq to scATAC-seq, especially for their heterogeneous features.ResultsWe propose a new transfer learning method, scNCL, which utilizes prior knowledge and contrastive learning to tackle the problem of heterogeneous features. Briefly, scNCL transforms scATAC-seq features into gene activity matrix based on prior knowledge. Since feature transformation can cause information loss, scNCL introduces neighborhood contrastive learning to preserve the neighborhood structure of scATAC-seq cells in raw feature space. To learn transferable latent features, scNCL uses a feature projection loss and an alignment loss to harmonize embeddings between scRNA-seq and scATAC-seq. Experiments on various datasets demonstrated that scNCL not only realizes accurate and robust label transfer for common types, but also achieves reliable detection of novel types. scNCL is also computationally efficient and scalable to million-scale datasets. Moreover, we prove scNCL can help refine cell-type annotations in existing scATAC-seq atlases.Availability and implementationThe source code and data used in this paper can be found in https://github.com/CSUBioGroup/scNCL-release.
Project description:Single-cell multi-omics techniques, which enable the simultaneous measurement of multiple modalities such as RNA gene expression and Assay for Transposase-Accessible Chromatin (ATAC) within individual cells, have become a powerful tool for deciphering the intricate complexity of cellular systems. Most current methods rely on motif databases to establish cross-modality relationships between genes from RNA-seq data and peaks from ATAC-seq data. However, these approaches are constrained by incomplete database coverage, particularly for novel or poorly characterized relationships. To address these limitations, we introduce single-cell Multi-omics Integration (scMI), a heterogeneous graph embedding method that encodes both cells and modality features from single-cell RNA-seq and ATAC-seq data into a shared latent space by learning cross-modality relationships. By modeling cells and modality features as distinct node types, we design an inter-type attention mechanism to effectively capture long-range cross-modality interactions between genes and peaks. Benchmark results demonstrate that embeddings learned by scMI preserve more biological information and achieve comparable or superior performance in downstream tasks including modality prediction, cell clustering, and gene regulatory network inference compared to methods that rely on databases. Furthermore, scMI significantly improves the alignment and integration of unmatched multi-omics data, enabling more accurate embedding and improved outcomes in downstream tasks.