Project description:Single-cell RNA sequencing (scRNA-seq) data are noisy and sparse. Here, we show that transfer learning across datasets remarkably improves data quality. By coupling a deep autoencoder with a Bayesian model, SAVER-X extracts transferable gene-gene relationships across data from different labs, varying conditions and divergent species, to denoise new target datasets.
Project description:Since many single-cell RNA-seq (scRNA-seq) data are obtained after cell sorting, such as when investigating immune cells, tracking cellular landscape by integrating single-cell data with spatial transcriptomic data is limited due to cell type and cell composition mismatch between the two datasets. We developed a method, spSeudoMap, which utilizes sorted scRNA-seq data to create virtual cell mixtures that closely mimic the gene expression of spatial data and trains a domain adaptation model for predicting spatial cell compositions. The method was applied in brain and breast cancer tissues and accurately predicted the topography of cell subpopulations. spSeudoMap may help clarify the roles of a few, but crucial cell types.
Project description:Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
Project description:Spatial transcriptomics (ST) technologies have revolutionized our understanding of cellular ecosystems. However, these technologies face challenges such as sparse gene signals and limited gene detection capacities, which hinder their ability to fully capture comprehensive spatial gene expression profiles. To address these limitations, we propose leveraging single-cell RNA sequencing (scRNA-seq), which provides comprehensive gene expression data but lacks spatial context, to enrich ST profiles. Herein, we introduce SpaIM, an innovative style transfer learning model that utilizes scRNA-seq information to predict unmeasured gene expressions in ST data, thereby improving gene coverage and expressions. SpaIM segregates scRNA-seq and ST data into data-agnostic contents and data-specific styles, with the contents capture the commonalities between the two data types, while the styles highlight their unique differences. By integrating the strengths of scRNA-seq and ST, SpaIM overcomes data sparsity and limited gene coverage issues, making significant advancements over 12 existing methods. This improvement is demonstrated across 53 diverse ST datasets, spanning sequencing- and imaging-based spatial technologies in various tissue types. Additionally, SpaIM enhances downstream analyses, including the detection of ligand-receptor interactions, spatial domain characterization, and identification of differentially expressed genes. Released as open-source software, SpaIM increases accessibility for spatial transcriptomics analysis. In summary, SpaIM represents a pioneering approach to enrich spatial transcriptomics using scRNA-seq data, enabling precise gene expression imputation and advancing the field of spatial transcriptomics research.
Project description:The development of single-cell sequencing technologies has allowed researchers to gain important new knowledge about the expression profile of genes in thousands of individual cells of a model organism or tissue. A common disadvantage of this technology is the loss of the three-dimensional (3-D) structure of the cells. Consequently, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized the Single-Cell Transcriptomics Challenge, in which we participated, with the aim to address the following two problems: (a) to identify the top 60, 40, and 20 genes of the Drosophila melanogaster embryo that contain the most spatial information and (b) to reconstruct the 3-D arrangement of the embryo using information from those genes. We developed two independent techniques, leveraging machine learning models from least absolute shrinkage and selection operator (Lasso) and deep neural networks (NNs), which are applied to high-dimensional single-cell sequencing data in order to accurately identify genes that contain spatial information. Our first technique, Lasso.TopX, utilizes the Lasso and ranking statistics and allows a user to define a specific number of features they are interested in. The NN approach utilizes weak supervision for linear regression to accommodate for uncertain or probabilistic training labels. We show, individually for both techniques, that we are able to identify important, stable, and a user-defined number of genes containing the most spatial information. The results from both techniques achieve high performance when reconstructing spatial information in D. melanogaster and also generalize to zebrafish (Danio rerio). Furthermore, we identified novel D. melanogaster genes that carry important positional information and were not previously suspected. We also show how the indirect use of the full datasets' information can lead to data leakage and generate bias in overestimating the model's performance. Lastly, we discuss the applicability of our approaches to other feature selection problems outside the realm of single-cell sequencing and the importance of being able to handle probabilistic training labels. Our source code and detailed documentation are available at https://github.com/TJU-CMC-Org/SingleCell-DREAM/.
Project description:Recent advances in spatially-resolved transcriptomics have enabled profiling of gene expression in a spatial context, which has led to the generation of large-scale single-cell and spatial atlases with computationally-derived cell type or spatial domain labels. An increasingly important task with these data has become the transfer of cell type or spatial domain annotations from a given reference (or source) atlas into a new target tissue or sample. The reference and target datasets could be at different resolutions or measured on different experimental platforms. Here, we present a method to perform cross-platform transfer learning that takes as input single-cell or spatial domain labels from a reference atlas or dataset and transfers the labels to a target dataset at a similar or different resolution. Specifically, we use non-negative matrix factorization (NMF) on the reference data to identify factors associated with labels of interest and project these factors into the target dataset to label each new observation. We use a multinomial model with the factors as covariates and labels as the response to predict labels in the target dataset. In contrast to existing approaches, the advantage of our approach is interpretability, without compromising on accuracy. We demonstrate the performance of the method in two human brain tissues and show that our model identifies spatially coherent domains in the target datasets with concordance of marker gene expression. We implement spaTransfer in open-source software as an R package (github.com/cindyfang70/spaTransfer).
Project description:The field of spatial transcriptomics is rapidly expanding, and with it the repertoire of available technologies. However, several of the transcriptome-wide spatial assays do not operate on a single cell level, but rather produce data comprised of contributions from a - potentially heterogeneous - mixture of cells. Still, these techniques are attractive to use when examining complex tissue specimens with diverse cell populations, where complete expression profiles are required to properly capture their richness. Motivated by an interest to put gene expression into context and delineate the spatial arrangement of cell types within a tissue, we here present a model-based probabilistic method that uses single cell data to deconvolve the cell mixtures in spatial data. To illustrate the capacity of our method, we use data from different experimental platforms and spatially map cell types from the mouse brain and developmental heart, which arrange as expected.
Project description:Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq dataset from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.
Project description:Recent advances in spatially-resolved transcriptomics have enabled profiling of gene expression in a spatial context, which has led to the generation of large-scale single-cell and spatial atlases with computationally-derived cell type or spatial domain labels. An increasingly important task with these data has become the transfer of cell type or spatial domain annotations from a given reference (or source) atlas into a new target tissue or sample. The reference and target datasets could be at different resolutions or measured on different experimental platforms. Here, we present a method to perform cross-platform transfer learning that takes as input single-cell or spatial domain labels from a reference atlas or dataset and transfers the labels to a target dataset at a similar or different resolution. Specifically, we use non-negative matrix factorization (NMF) on the reference data to identify factors associated with labels of interest and project these factors into the target dataset to label each new observation. We use a multinomial model with the factors as covariates and labels as the response to predict labels in the target dataset. In contrast to existing approaches, the advantage of our approach is interpretability, without compromising on accuracy. We demonstrate the performance of the method in two human brain tissues and show that our model identifies spatially coherent domains in the target datasets with concordance of marker gene expression. We implement spaTransfer in open-source software as an R package (github.com/cindyfang70/spaTransfer).
Project description:Single-cell and spatial transcriptome sequencing, two recently optimized transcriptome sequencing methods, are increasingly used to study cancer and related diseases. Cell annotation, particularly for malignant cell annotation, is essential and crucial for in-depth analyses in these studies. However, current algorithms lack accuracy and generalization, making it difficult to consistently and rapidly infer malignant cells from pan-cancer data. To address this issue, we present Cancer-Finder, a domain generalization-based deep-learning algorithm that can rapidly identify malignant cells in single-cell data with an average accuracy of 95.16%. More importantly, by replacing the single-cell training data with spatial transcriptomic datasets, Cancer-Finder can accurately identify malignant spots on spatial slides. Applying Cancer-Finder to 5 clear cell renal cell carcinoma spatial transcriptomic samples, Cancer-Finder demonstrates a good ability to identify malignant spots and identifies a gene signature consisting of 10 genes that are significantly co-localized and enriched at the tumor-normal interface and have a strong correlation with the prognosis of clear cell renal cell carcinoma patients. In conclusion, Cancer-Finder is an efficient and extensible tool for malignant cell annotation.