Project description:An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited sequencing budget: deep sequencing of a few cells or shallow sequencing of many cells? Here we present a mathematical framework which reveals that, for estimating many important gene properties, the optimal allocation is to sequence at a depth of around one read per cell per gene. Interestingly, the corresponding optimal estimator is not the widely-used plug-in estimator, but one developed via empirical Bayes.
Project description:Droplet-based single-cell RNA-seq (scRNA-seq) data are plagued by ambient contaminations caused by nucleic acid material released by dead and dying cells. This material is mixed into the buffer and is co-encapsulated with cells, leading to a lower signal-to-noise ratio. Although there exist computational methods to remove ambient contaminations post-hoc, the reliability of algorithms in generating high-quality data from low-quality sources remains uncertain. Here, we assess data quality before data filtering by a set of quantitative, contamination-based metrics that assess data quality more effectively than standard metrics. Through a series of controlled experiments, we report improvements that can minimize ambient contamination outside of tissue dissociation, via cell fixation, improved cell loading, microfluidic dilution, and nuclei versus cell preparation; many of these parameters are inaccessible on commercial platforms. We provide end-users with insights on factors that can guide their decision-making regarding optimizations that minimize ambient contamination, and metrics to assess data quality.
Project description:BackgroundDirect cDNA preamplification protocols developed for single-cell RNA-seq have enabled transcriptome profiling of precious clinical samples and rare cell populations without the need for sample pooling or RNA extraction. We term the use of single-cell chemistries for sequencing low numbers of cells limiting-cell RNA-seq (lcRNA-seq). Currently, there is no customized algorithm to select robust/low-noise transcripts from lcRNA-seq data for between-group comparisons.MethodsHerein, we present CLEAR, a workflow that identifies reliably quantifiable transcripts in lcRNA-seq data for differentially expressed genes (DEG) analysis. Total RNA obtained from primary chronic lymphocytic leukemia (CLL) CD5+ and CD5- cells were used to develop the CLEAR algorithm. Once established, the performance of CLEAR was evaluated with FACS-sorted cells enriched from mouse Dentate Gyrus (DG).ResultsWhen using CLEAR transcripts vs. using all transcripts in CLL samples, downstream analyses revealed a higher proportion of shared transcripts across three input amounts and improved principal component analysis (PCA) separation of the two cell types. In mouse DG samples, CLEAR identifies noisy transcripts and their removal improves PCA separation of the anticipated cell populations. In addition, CLEAR was applied to two publicly-available datasets to demonstrate its utility in lcRNA-seq data from other institutions. If imputation is applied to limit the effect of missing data points, CLEAR can also be used in large clinical trials and in single cell studies.ConclusionslcRNA-seq coupled with CLEAR is widely used in our institution for profiling immune cells (circulating or tissue-infiltrating) for its transcript preservation characteristics. CLEAR fills an important niche in pre-processing lcRNA-seq data to facilitate transcriptome profiling and DEG analysis. We demonstrate the utility of CLEAR in analyzing rare cell populations in clinical samples and in murine neural DG region without sample pooling.
Project description:Single-cell RNA-seq data contains a lot of dropouts hampering downstream analyses due to the low number and inefficient capture of mRNAs in individual cells. Here, we present Epi-Impute, a computational method for dropout imputation by reconciling expression and epigenomic data. Epi-Impute leverages single-cell ATAC-seq data as an additional source of information about gene activity to reduce the number of dropouts. We demonstrate that Epi-Impute outperforms existing methods, especially for very sparse single-cell RNA-seq data sets, significantly reducing imputation error. At the same time, Epi-Impute accurately captures the primary distribution of gene expression across cells while preserving the gene-gene and cell-cell relationship in the data. Moreover, Epi-Impute allows for the discovery of functionally relevant cell clusters as a result of the increased resolution of scRNA-seq data due to imputation.
Project description:Single-cell RNA-seq data contain a large proportion of zeros for expressed genes. Such dropout events present a fundamental challenge for various types of data analyses. Here, we describe the SCRABBLE algorithm to address this problem. SCRABBLE leverages bulk data as a constraint and reduces unwanted bias towards expressed genes during imputation. Using both simulation and several types of experimental data, we demonstrate that SCRABBLE outperforms the existing methods in recovering dropout events, capturing true distribution of gene expression across cells, and preserving gene-gene relationship and cell-cell relationship in the data.
Project description:IntroductionGlioma is the most frequent and lethal form of primary brain tumor. The molecular mechanism of oncogenesis and progression of glioma still remains unclear, rendering the therapeutic effect of conventional radiotherapy, chemotherapy, and surgical resection insufficient. In this study, we sought to explore the function of HEC1 (highly expressed in cancer 1) in glioma; a component of the NDC80 complex in glioma is crucial in the regulation of kinetochore.MethodsBulk RNA and scRNA-seq analyses were used to infer HEC1 function, and in vitro experiments validated its function.ResultsHEC1 overexpression was observed in glioma and was indicative of poor prognosis and malignant clinical features, which was confirmed in human glioma tissues. High HEC1 expression was correlated with more active cell cycle, DNA-associated activities, and the formation of immunosuppressive tumor microenvironment, including interaction with immune cells, and correlated strongly with infiltrating immune cells and enhanced expression of immune checkpoints. In vitro experiments and RNA-seq further confirmed the role of HEC1 in promoting cell proliferation, and the expression of DNA replication and repair pathways in glioma. Coculture assay confirmed that HEC1 promotes microglial migration and the transformation of M1 phenotype macrophage to M2 phenotype.ConclusionAltogether, these findings demonstrate that HEC1 may be a potential prognostic marker and an immunotherapeutic target in glioma.
Project description:BackgroundKidney renal clear cell carcinoma (KIRC) is a major subtype of renal cell carcinoma with poor prognosis due to its invasive and metastatic nature. Despite advances in understanding the molecular underpinnings of various cancers, the role of branched-chain amino acid transferase 1 (BCAT1) in KIRC remains underexplored. This study aims to fill this gap by investigating the oncogenic role of BCAT1 in KIRC using single-cell RNA-seq data and experimental validation.MethodsSingle-cell transcriptomic data GSE159115 was utilized to investigate potential biomarkers in KIRC. After screening, we used BCAT1 as a target gene and investigated its function and mechanism in KIRC through databases such as TCGA-GTEx, using genome enrichment analysis (GSEA), genome variation analysis (GSVA), gene ontology (GO) and Kyoto Encyclopedia of the Genome (KEGG). BCAT1 expression was detected in clinical tissue samples using Western Blotting (WB) and immunohistochemical (IHC) staining techniques. We established cell lines stably overexpressing and knocking down BCAT1 and performed WB, qRT-PCR, cell scratch assay and transwell assay.ResultsBCAT1 was highly expressed in KIRC and was associated with disease prognosis and TME. Patients with mutations in the BCAT1 gene had shorter overall survival (OS) and disease-free survival (DFS). patients with high BCAT1 expression had shorter OS, progression-free interval (PFI), and disease-specific survival (DSS). GSEA showed that BCAT1 was significantly enriched in epithelial mesenchymal transition (EMT). Bioinformatics analysis and WB and IHC staining showed that BCAT1 expression was higher in KIRC than in paracancerous tissues. In vitro experiments confirmed that BCAT1 in KIRC cells may promote EMT affecting its invasion, migration. We constructed a protein interaction network (PPI) to hypothesize proteins that may interact with BCAT1. Single-sample gene set enrichment analysis (ssGSEA) revealed the immune infiltration environment of BCAT1. Furthermore, hypomethylation of the BCAT1 promoter region in KIRC may contribute to disease progression by promoting BCAT1 expression.ConclusionBCAT1 promotes KIRC invasion and metastasis through EMT and has prognostic predictive value and potential as a biomarker. It may become a novel biomarker.