Project description:To investigate the gene regulatory mechanisms driving T cell development, we generated single-cell transcriptomics and chromatin accessibility data from a human fetal thymus sample at 10 weeks of gestation.
Project description:Bufotoxin is an endogenous toxin made up of several physiologically active components that toads deploy as a defense against their natural enemies. Bufadienolides (BDS), which is isolated from bufotoxin, is an important anticancer drug, and other components such as bufotenine and alkaloids are also important drug resources. The distribution characteristics and biosynthesis of bufotoxins in the postauricular glands (PGs) of toads are not well understood. We examined the toad's PGs using the MADLI/MSI technique, a total of 1,872 components were found, and some pharmacological components were visible. These findings indicate that bufotoxins are primarily abundant in the plasma glands (pG) and epidermal tissues of the glands. By using single-cell sequencing, it was possible to create a single-cell atlas of 9316 PGs cells. These cells were then categorized into nine clusters using marker genes, and two types of epithelial cells were verified using in situ hybridization investigations. It was confirmed that cholesterol is a precursor component of BDS biosynthesis, we concentrated on the cholesterol metabolism component and postulated the primary bile acid pathway as a downstream biosynthesis pathway of BDS through transcriptomic studies of two pG and mucous glands (MG) with distinct secretory functions. Optimal and silenced genes for potential BDS synthesis pathways, toad toxin tryptamine and alkaloid biosynthesis, terpene skeleton and steroid hormones were identified by calculating the cellular coverage of genes. Our data demonstrate the metabolic mapping of bufotoxins in the PGs of the toad, and create the first single-cell atlas of PGs in the toad, providing a reference for the study of biosynthesis of natural active ingredients in animals.
Project description:MOTIVATION:Single-cell RNA sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. RESULTS:Here we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and 6 different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experimental design and comparison of scRNA-seq computational methods based on specific research goals. AVAILABILITY AND IMPLEMENTATION:We have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/scDesign. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Despite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without extensive bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user. Through an interactive and intuitive graphical interface, users can (i) explore similarities and heterogeneity between samples and cell clusters in two-dimensional or three-dimensional projections such as t-SNE or UMAP, (ii) display the expression level of single genes or gene sets of interest, (iii) browse tables of most expressed genes and marker genes for each sample and cluster and (iv) display trajectories calculated with Monocle 2. We provide three examples prepared from publicly available datasets to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists that facilitates effective interaction to shorten the gap between analysis and interpretation of the data.Availability and implementationThe Cerebro application, additional documentation, and example datasets are available at https://github.com/romanhaa/Cerebro. Similarly, the cerebroApp R package is available at https://github.com/romanhaa/cerebroApp. All components are released under the MIT License.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Single-cell technologies are emerging fast due to their ability to unravel the heterogeneity of biological systems. While scRNA-seq is a powerful tool that measures whole-transcriptome expression of single cells, it lacks their spatial localization. Novel spatial transcriptomics methods do retain cells spatial information but some methods can only measure tens to hundreds of transcripts. To resolve this discrepancy, we developed SpaGE, a method that integrates spatial and scRNA-seq datasets to predict whole-transcriptome expressions in their spatial configuration. Using five dataset-pairs, SpaGE outperformed previously published methods and showed scalability to large datasets. Moreover, SpaGE predicted new spatial gene patterns that are confirmed independently using in situ hybridization data from the Allen Mouse Brain Atlas.
Project description:Single-cell sample multiplexing technologies function by associating sample-specific barcode tags with cell-specific barcode tags, thereby increasing sample throughput, reducing batch effects, and decreasing reagent costs. Computational methods must then correctly associate cell-tags with sample-tags, but their performance deteriorates rapidly when working with datasets that are large, have imbalanced cell numbers across samples, or are noisy due to cross-contamination among sample tags - unavoidable features of many real-world experiments. Here we introduce deMULTIplex2, a mechanism-guided classification algorithm for multiplexed scRNA-seq data that successfully recovers many more cells across a spectrum of challenging datasets compared to existing methods. deMULTIplex2 is built on a statistical model of tag read counts derived from the physical mechanism of tag cross-contamination. Using generalized linear models and expectation-maximization, deMULTIplex2 probabilistically infers the sample identity of each cell and classifies singlets with high accuracy. Using Randomized Quantile Residuals, we show the model fits both simulated and real datasets. Benchmarking analysis suggests that deMULTIplex2 outperforms existing algorithms, especially when handling large and noisy single-cell datasets or those with unbalanced sample compositions.
Project description:CRISPR-based genome perturbation provides a new avenue to conveniently change DNA sequences, transcription, and epigenetic modifications in genetic screens. However, it remains challenging to assay the complex molecular readouts after perturbation at high resolution and at scale. By introducing an A/G mixed capture sequence into the gRNA scaffold, we demonstrate that gRNA transcripts could be directly reverse transcribed by poly (dT) primer together with the endogenous mRNA, followed by high-content molecular phenotyping in scRNA-seq (Direct-seq). With this method, the CRISPR perturbation and its transcriptional readouts can be profiled together in a streamlined workflow.
Project description:BackgroundSingle-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. Even though multiple scRNA-seq clustering techniques have been proposed, there is no consensus on the best performing approach. On a parallel research track, self-supervised contrastive learning recently achieved state-of-the-art results on images clustering and, subsequently, image classification.ResultsWe propose contrastive-sc, a new unsupervised learning method for scRNA-seq data that perform cell clustering. The method consists of two consecutive phases: first, an artificial neural network learns an embedding for each cell through a representation training phase. The embedding is then clustered in the second phase with a general clustering algorithm (i.e. KMeans or Leiden community detection). The proposed representation training phase is a new adaptation of the self-supervised contrastive learning framework, initially proposed for image processing, to scRNA-seq data. contrastive-sc has been compared with ten state-of-the-art techniques. A broad experimental study has been conducted on both simulated and real-world datasets, assessing multiple external and internal clustering performance metrics (i.e. ARI, NMI, Silhouette, Calinski scores). Our experimental analysis shows that constastive-sc compares favorably with state-of-the-art methods on both simulated and real-world datasets.ConclusionOn average, our method identifies well-defined clusters in close agreement with ground truth annotations. Our method is computationally efficient, being fast to train and having a limited memory footprint. contrastive-sc maintains good performance when only a fraction of input cells is provided and is robust to changes in hyperparameters or network architecture. The decoupling between the creation of the embedding and the clustering phase allows the flexibility to choose a suitable clustering algorithm (i.e. KMeans when the number of expected clusters is known, Leiden otherwise) or to integrate the embedding with other existing techniques.