Project description:The proliferation of single-cell RNA sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent development in single-cell DNA methylation (scDNAm), new avenues have been opened for deconvolving bulk DNAm data, particularly for solid tissues like the brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create a precise cell-type signature matrix that surpasses state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
Project description:The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
Project description:The nature and extent of immune cell infiltration into solid tumours are key determinants of therapeutic response. Here, using a DNA methylation-based approach to tumour cell fraction deconvolution, we report the integrated analysis of tumour composition and genomics across a wide spectrum of solid cancers. Initially studying head and neck squamous cell carcinoma, we identify two distinct tumour subgroups: 'immune hot' and 'immune cold', which display differing prognosis, mutation burden, cytokine signalling, cytolytic activity and oncogenic driver events. We demonstrate the existence of such tumour subgroups pan-cancer, link clonal-neoantigen burden to cytotoxic T-lymphocyte infiltration, and show that transcriptional signatures of hot tumours are selectively engaged in immunotherapy responders. We also find that treatment-naive hot tumours are markedly enriched for known immune-resistance genomic alterations, potentially explaining the heterogeneity of immunotherapy response and prognosis seen within this group. Finally, we define a catalogue of mediators of active antitumour immunity, deriving candidate biomarkers and potential targets for precision immunotherapy.
Project description:The cell type composition of many biological tissues varies widely across samples. Such sample heterogeneity hampers efforts to probe the role of each cell type in the tissue microenvironment. Current approaches that address this issue have drawbacks. Cell sorting or single-cell based experimental techniques disrupt in situ interactions and alter physiological status of cells in tissues. Computational methods are flexible and promising; but they often estimate either sample-specific proportions of each cell type or cell-type-specific gene expression profiles, not both, by requiring the other as input. We introduce a computational Complete Deconvolution method that can estimate both sample-specific proportions of each cell type and cell-type-specific gene expression profiles simultaneously using bulk RNA-Seq data only (CDSeq). We assessed our method’s performance using several synthetic and experimental mixtures of varied but known cell-type composition and compared its performance to the performance of two state-of-the-art deconvolution methods on the same mixtures. The results showed CDSeq can estimate both sample-specific proportions of each component cell type and cell-type-specific gene expression profiles with high accuracy. CDSeq holds promise for computationally deciphering complex mixtures of cell types, each with differing expression profiles, using RNA-seq data measured in bulk tissue .
Project description:DNA methylation microarrays can be employed to interrogate cell-type composition in complex tissues. Here, we expand reference-based deconvolution of blood DNA methylation to include 12 leukocyte subtypes (neutrophils, eosinophils, basophils, monocytes, naïve and memory B cells, naïve and memory CD4 + and CD8 + T cells, natural killer, and T regulatory cells). Including derived variables, our method provides 56 immune profile variables. The IDOL (IDentifying Optimal Libraries) algorithm was used to identify libraries for deconvolution of DNA methylation data for current and previous platforms. The accuracy of deconvolution estimates obtained using our enhanced libraries was validated using artificial mixtures and whole-blood DNA methylation with known cellular composition from flow cytometry. We applied our libraries to deconvolve cancer, aging, and autoimmune disease datasets. In conclusion, these libraries enable a detailed representation of immune-cell profiles in blood using only DNA and facilitate a standardized, thorough investigation of immune profiles in human health and disease.
Project description:IntroductionThe human brain comprises heterogeneous cell types whose composition can be altered with physiological and pathological conditions. New approaches to discern the diversity and distribution of brain cells associated with neurological conditions would significantly advance the study of brain-related pathophysiology and neuroscience. Unlike single-nuclei approaches, DNA methylation-based deconvolution does not require special sample handling or processing, is cost-effective, and easily scales to large study designs. Existing DNA methylation-based methods for brain cell deconvolution are limited in the number of cell types deconvolved.MethodsUsing DNA methylation profiles of the top cell-type-specific differentially methylated CpGs, we employed a hierarchical modeling approach to deconvolve GABAergic neurons, glutamatergic neurons, astrocytes, microglial cells, oligodendrocytes, endothelial cells, and stromal cells.ResultsWe demonstrate the utility of our method by applying it to data on normal tissues from various brain regions and in aging and diseased tissues, including Alzheimer's disease, autism, Huntington's disease, epilepsy, and schizophrenia.DiscussionWe expect that the ability to determine the cellular composition in the brain using only DNA from bulk samples will accelerate understanding brain cell type composition and cell-type-specific epigenetic states in normal and diseased brain tissues.
Project description:The discovery of N6-methyldeoxyadenine (6mA) across eukaryotes led to a search for additional epigenetic mechanisms. However, some studies have highlighted confounding factors that challenge the prevalence of 6mA in eukaryotes. We developed a metagenomic method to quantitatively deconvolve 6mA events from a genomic DNA sample into species of interest, genomic regions, and sources of contamination. Applying this method, we observed high-resolution 6mA deposition in two protozoa. We found that commensal or soil bacteria explained the vast majority of 6mA in insect and plant samples. We found no evidence of high abundance of 6mA in Drosophila, Arabidopsis, or humans. Plasmids used for genetic manipulation, even those from Dam methyltransferase mutant Escherichia coli, could carry abundant 6mA, confounding the evaluation of candidate 6mA methyltransferases and demethylases. On the basis of this work, we advocate for a reassessment of 6mA in eukaryotes.
Project description:BackgroundConfounding due to cellular heterogeneity represents one of the foremost challenges currently facing Epigenome-Wide Association Studies (EWAS). Statistical methods leveraging the tissue-specificity of DNA methylation for deconvoluting the cellular mixture of heterogenous biospecimens offer a promising solution, however the performance of such methods depends entirely on the library of methylation markers being used for deconvolution. Here, we introduce a novel algorithm for Identifying Optimal Libraries (IDOL) that dynamically scans a candidate set of cell-specific methylation markers to find libraries that optimize the accuracy of cell fraction estimates obtained from cell mixture deconvolution.ResultsApplication of IDOL to training set consisting of samples with both whole-blood DNA methylation data (Illumina HumanMethylation450 BeadArray (HM450)) and flow cytometry measurements of cell composition revealed an optimized library comprised of 300 CpG sites. When compared existing libraries, the library identified by IDOL demonstrated significantly better overall discrimination of the entire immune cell landscape (p = 0.038), and resulted in improved discrimination of 14 out of the 15 pairs of leukocyte subtypes. Estimates of cell composition across the samples in the training set using the IDOL library were highly correlated with their respective flow cytometry measurements, with all cell-specific R (2)>0.99 and root mean square errors (RMSEs) ranging from [0.97 % to 1.33 %] across leukocyte subtypes. Independent validation of the optimized IDOL library using two additional HM450 data sets showed similarly strong prediction performance, with all cell-specific R (2)>0.90 and R M S E<4.00 %. In simulation studies, adjustments for cell composition using the IDOL library resulted in uniformly lower false positive rates compared to competing libraries, while also demonstrating an improved capacity to explain epigenome-wide variation in DNA methylation within two large publicly available HM450 data sets.ConclusionsDespite consisting of half as many CpGs compared to existing libraries for whole blood mixture deconvolution, the optimized IDOL library identified herein resulted in outstanding prediction performance across all considered data sets and demonstrated potential to improve the operating characteristics of EWAS involving adjustments for cell distribution. In addition to providing the EWAS community with an optimized library for whole blood mixture deconvolution, our work establishes a systematic and generalizable framework for the assembly of libraries that improve the accuracy of cell mixture deconvolution.
Project description:Estimating the abundance of cell-free DNA (cfDNA) fragments shed from a tumor (i.e., circulating tumor DNA (ctDNA)) can approximate tumor burden, which has numerous clinical applications. We derived a novel, broadly applicable statistical method to quantify cancer-indicative methylation patterns within cfDNA to estimate ctDNA abundance, even at low levels. Our algorithm identified differentially methylated regions (DMRs) between a reference database of cancer tissue biopsy samples and cfDNA from individuals without cancer. Then, without utilizing matched tissue biopsy, counts of fragments matching the cancer-indicative hyper/hypo-methylated patterns within DMRs were used to determine a tumor methylated fraction (TMeF; a methylation-based quantification of the circulating tumor allele fraction and estimate of ctDNA abundance) for plasma samples. TMeF and small variant allele fraction (SVAF) estimates of the same cancer plasma samples were correlated (Spearman's correlation coefficient: 0.73), and synthetic dilutions to expected TMeF of 10-3 and 10-4 had estimated TMeF within two-fold for 95% and 77% of samples, respectively. TMeF increased with cancer stage and tumor size and inversely correlated with survival probability. Therefore, tumor-derived fragments in the cfDNA of patients with cancer can be leveraged to estimate ctDNA abundance without the need for a tumor biopsy, which may provide non-invasive clinical approximations of tumor burden.
Project description:We performed systematic assessment of computational deconvolution methods that play an important role in the estimation of cell type proportions from bulk methylation data. The proposed framework methylDeConv (available as an R package) integrates several deconvolution methods for methylation profiles (Illumina HumanMethylation450 and MethylationEPIC arrays) and offers different cell-type-specific CpG selection to construct the extended reference library which incorporates the main immune cell subsets, epithelial cells and cell-free DNAs. We compared the performance of different deconvolution algorithms via simulations and benchmark datasets and further investigated the associations of the estimated cell type proportions to cancer therapy in breast cancer and subtypes in melanoma methylation case studies. Our results indicated that the deconvolution based on the extended reference library is critical to obtain accurate estimates of cell proportions in non-blood tissues.