Project description:Computational deconvolution is a time and cost-efficient approach to obtain cell type-specific information from bulk gene expression of heterogeneous tissues like blood. Deconvolution can aim to either estimate cell type proportions or abundances in samples, or estimate how strongly each present cell type expresses different genes, or both tasks simultaneously. Among the two separate goals, the estimation of cell type proportions/abundances is widely studied, but less attention has been paid on defining the cell type-specific expression profiles. Here, we address this gap by introducing a novel method Rodeo and empirically evaluating it and the other available tools from multiple perspectives utilizing diverse datasets.
Project description:The proliferation of single-cell RNA sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent development in single-cell DNA methylation (scDNAm), new avenues have been opened for deconvolving bulk DNAm data, particularly for solid tissues like the brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create a precise cell-type signature matrix that surpasses state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
Project description:The quanta unit of the immune system is the cell, yet analyzed samples are often heterogeneous with respect to cell subsets which can mislead result interpretation. Experimentally, researchers face a difficult choice whether to profile heterogeneous samples with the ensuing confounding effects, or a priori focus on a few cell subsets of interest, potentially limiting new discoveries. An attractive alternative solution is to extract cell subset-specific information directly from heterogeneous samples via computational deconvolution techniques, thereby capturing both cell-centered and whole system level context. Such approaches are capable of unraveling novel biology, undetectable otherwise. Here we review the present state of available deconvolution techniques, their advantages and limitations, with a focus on blood expression data and immunological studies in general.
Project description:Spatially resolved transcriptomics performs high-throughput measurement of transcriptomes while preserving spatial information about the cellular organizations. However, many spatially resolved transcriptomic technologies can only distinguish spots consisting of a mixture of cells instead of working at single-cell resolution. Here, we present STdGCN, a graph neural network model designed for cell type deconvolution of spatial transcriptomic (ST) data that can leverage abundant single-cell RNA sequencing (scRNA-seq) data as reference. STdGCN is the first model incorporating the expression profiles from single cell data as well as the spatial localization information from the ST data for cell type deconvolution. Extensive benchmarking experiments on multiple ST datasets showed that STdGCN outperformed 14 published state-of-the-art models. Applied to a human breast cancer Visium dataset, STdGCN discerned spatial distributions between stroma, lymphocytes and cancer cells for tumor microenvironment dissection. In a human heart ST dataset, STdGCN detected the changes of potential endothelial-cardiomyocyte communications during tissue development.
Project description:The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
Project description:Human tissue samples are often mixtures of heterogeneous cell types, which can confound the analyses of gene expression data derived from such tissues. The cell type composition of a tissue sample may itself be of interest and is needed for proper analysis of differential gene expression. A variety of computational methods have been developed to estimate cell type proportions using gene-level expression data. However, RNA isoforms can also be differentially expressed across cell types, and isoform-level expression could be equally or more informative for determining cell type origin than gene-level expression. We propose a new computational method, IsoDeconvMM, which estimates cell type fractions using isoform-level gene expression data. A novel and useful feature of IsoDeconvMM is that it can estimate cell type proportions using only a single gene, though in practice we recommend aggregating estimates of a few dozen genes to obtain more accurate results. We demonstrate the performance of IsoDeconvMM using a unique data set with cell type-specific RNA-seq data across more than 135 individuals. This data set allows us to evaluate different methods given the biological variation of cell type-specific gene expression data across individuals. We further complement this analysis with additional simulations.
Project description:ObjectiveWe describe herein a bioinformatics approach that leverages gene expression data from brain homogenates to derive cell-type specific differential expression results.ResultsWe found that differentially expressed (DE) cell-specific genes were mostly identified as neuronal, microglial, or endothelial in origin. However, a large proportion (75.7%) was not attributable to specific cells due to the heterogeneity in expression among brain cell types. Neuronal DE genes were consistently downregulated and associated with synaptic and neuronal processes as described previously in the field thereby validating this approach. We detected several DE genes related to angiogenesis (endothelial cells) and proteoglycans (oligodendrocytes).ConclusionsWe present a cost- and time-effective method exploiting brain homogenate DE data to obtain insights about cell-specific expression. Using this approach we identify novel findings in AD in endothelial cells and oligodendrocytes that were previously not reported.MethodsWe derived an enrichment score for each gene using a publicly available RNA profiling database generated from seven different cell types isolated from mouse cerebral cortex. We then classified the differential expression results from 3 publicly accessible Late-Onset Alzheimer's disease (AD) studies including seven different brain regions.
Project description:Spatially resolved transcriptomics integrates high-throughput transcriptome measurements with preserved spatial cellular organization information. However, many technologies cannot reach single-cell resolution. We present STdGCN, a graph model leveraging single-cell RNA sequencing (scRNA-seq) as reference for cell-type deconvolution in spatial transcriptomic (ST) data. STdGCN incorporates expression profiles from scRNA-seq and spatial localization from ST data for deconvolution. Extensive benchmarking on multiple datasets demonstrates that STdGCN outperforms 17 state-of-the-art models. In a human breast cancer Visium dataset, STdGCN delineates stroma, lymphocytes, and cancer cells, aiding tumor microenvironment analysis. In human heart ST data, STdGCN identifies changes in endothelial-cardiomyocyte communications during tissue development.
Project description:Knowledge of cell type composition in disease relevant tissues is an important step towards the identification of cellular targets of disease. We present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. By appropriate weighting of genes showing cross-subject and cross-cell consistency, MuSiC enables the transfer of cell type-specific gene expression information from one dataset to another. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables the characterization of cellular heterogeneity of complex tissues for understanding of disease mechanisms. As bulk tissue data are more easily accessible than single-cell RNA-seq, MuSiC allows the utilization of the vast amounts of disease relevant bulk tissue RNA-seq data for elucidating cell type contributions in disease.
Project description:A challenge in bulk gene differential expression analysis is to differentiate changes due to cell type-specific gene expression and cell type proportions. SCADIE is an iterative algorithm that simultaneously estimates cell type-specific gene expression profiles and cell type proportions, and performs cell type-specific differential expression analysis at the group level. Through its unique penalty and objective function, SCADIE more accurately identifies cell type-specific differentially expressed genes than existing methods, including those that may be missed from single cell RNA-Seq data. SCADIE has robust performance with respect to the choice of deconvolution methods and the sources and quality of input data.