Project description:Recent technological advancements in single-cell genomics have enabled joint profiling of gene expression and alternative modalities at unprecedented scale. Consequently, the complexity of multi-omics data sets is increasing massively. Existing models for multi-modal data are typically limited in functionality or scalability, making data integration and downstream analysis cumbersome. We present multiDGD, a scalable deep generative model providing a probabilistic framework to learn shared representations of transcriptome and chromatin accessibility. It shows outstanding performance on data reconstruction without feature selection. We demonstrate on several data sets from human and mouse that multiDGD learns well-clustered joint representations. We further find that probabilistic modeling of sample covariates enables post-hoc data integration without the need for fine-tuning. Additionally, we show that multiDGD can detect statistical associations between genes and regulatory regions conditioned on the learned representations. multiDGD is available as an scverse-compatible package on GitHub.
Project description:Deciphering the features, structure, and functions of the cell niche in tissues remains a major challenge. Here, we present scNiche, a computational framework to identify and characterize cell niches from spatial omics data at single-cell resolution. We benchmark scNiche with both simulated and biological datasets, and demonstrate that scNiche can effectively and robustly identify cell niches while outperforming other existing methods. In spatial proteomics data from human triple-negative breast cancer, scNiche reveals the influence of the microenvironment on cellular phenotypes, and further dissects patient-specific niches with distinct cellular compositions or phenotypic characteristics. By analyzing mouse liver spatial transcriptomics data across normal and early-onset liver failure donors, scNiche uncovers disease-specific liver injury niches, and further delineates the niche remodeling from normal liver to liver failure. Overall, scNiche enables decoding the cellular microenvironment in tissues from single-cell spatial omics data.
Project description:We present deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq) for co-mapping of mRNAs and proteins in a formaldehyde-fixed tissue slide via next-generation sequencing (NGS). Parallel microfluidic channels were used to deliver DNA barcodes to the surface of a tissue slide, and crossflow of two sets of barcodes, A1-50 and B1-50, followed by ligation in situ, yielded a 2D mosaic of tissue pixels, each containing a unique full barcode AB. Application to mouse embryos revealed major tissue types in early organogenesis as well as fine features like microvasculature in a brain and pigmented epithelium in an eye field. Gene expression profiles in 10-μm pixels conformed into the clusters of single-cell transcriptomes, allowing for rapid identification of cell types and spatial distributions. DBiT-seq can be adopted by researchers with no experience in microfluidics and may find applications in a range of fields including developmental biology, cancer biology, neuroscience, and clinical pathology.
Project description:Inferring missing links based on the currently observed network is known as link prediction, which has tremendous real-world applications in biomedicine, e-commerce, social media, and criminal intelligence. Numerous methods have been proposed to solve the link prediction problem. Yet, many of these methods are designed for undirected networks only and based on domain-specific heuristics. Here we developed a new link prediction method based on deep generative models, which does not rely on any domain-specific heuristic and works for general undirected or directed complex networks. Our key idea is to represent the adjacency matrix of a network as an image and then learn hierarchical feature representations of the image by training a deep generative model. Those features correspond to structural patterns in the network at different scales, from small subgraphs to mesoscopic communities. When applied to various real-world networks from different domains, our method shows overall superior performance against existing methods.
Project description:MotivationFollowing many successful applications to image data, deep learning is now also increasingly considered for omics data. In particular, generative deep learning not only provides competitive prediction performance, but also allows for uncovering structure by generating synthetic samples. However, exploration and visualization is not as straightforward as with image applications.ResultsWe demonstrate how log-linear models, fitted to the generated, synthetic data can be used to extract patterns from omics data, learned by deep generative techniques. Specifically, interactions between latent representations learned by the approaches and generated synthetic data are used to determine sets of joint patterns. Distances of patterns with respect to the distribution of latent representations are then visualized in low-dimensional coordinate systems, e.g. for monitoring training progress. This is illustrated with simulated data and subsequently with cortical single-cell gene expression data. Using different kinds of deep generative techniques, specifically variational autoencoders and deep Boltzmann machines, the proposed approach highlights how the techniques uncover underlying structure. It facilitates the real-world use of such generative deep learning techniques to gain biological insights from omics data.Availability and implementationThe code for the approach as well as an accompanying Jupyter notebook, which illustrates the application of our approach, is available via the GitHub repository: https://github.com/ssehztirom/Exploring-generative-deep-learning-for-omics-data-by-using-log-linear-models.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundExploring the cellular processes of genes from the aspects of biological networks is of great interest to understanding the properties of complex diseases and biological systems. Biological networks, such as protein-protein interaction networks and gene regulatory networks, provide insights into the molecular basis of cellular processes and often form functional clusters in different tissue and disease contexts.ResultsWe present scGraph2Vec, a deep learning framework for generating informative gene embeddings. scGraph2Vec extends the variational graph autoencoder framework and integrates single-cell datasets and gene-gene interaction networks. We demonstrate that the gene embeddings are biologically interpretable and enable the identification of gene clusters representing functional or tissue-specific cellular processes. By comparing similar tools, we showed that scGraph2Vec clearly distinguished different gene clusters and aggregated more biologically functional genes. scGraph2Vec can be widely applied in diverse biological contexts. We illustrated that the embeddings generated by scGraph2Vec can infer disease-associated genes from genome-wide association study data (e.g., COVID-19 and Alzheimer's disease), identify additional driver genes in lung adenocarcinoma, and reveal regulatory genes responsible for maintaining or transitioning melanoma cell states.ConclusionsscGraph2Vec not only reconstructs tissue-specific gene networks but also obtains a latent representation of genes implying their biological functions.
Project description:Sequencing-based spatial transcriptomics (sST) enables transcriptome-wide gene expression mapping but falls short of reaching the optical resolution (200-300 nm) of imaging-based methods. Here, we present Seq-Scope-X (Seq-Scope-eXpanded), which empowers submicrometer-resolution Seq-Scope with tissue expansion to surpass this limitation. By physically enlarging tissues, Seq-Scope-X minimizes transcript diffusion effects and increases spatial feature density by an additional order of magnitude. In liver tissue, this approach resolves nuclear and cytoplasmic compartments in nearly every single cell, uncovering widespread differences between nuclear and cytoplasmic transcriptome patterns. Independently confirmed by imaging-based methods, these results suggest that individual hepatocytes can dynamically switch their metabolic roles. Seq-Scope-X is also applicable to non-hepatic tissues such as brain and colon, and can be modified to perform spatial proteomic analysis, simultaneously profiling hundreds of barcode-tagged antibody stains at microscopic resolutions in mouse spleens and human tonsils. These findings establish Seq-Scope-X as a transformative tool for ultra-high-resolution whole-transcriptome and proteome profiling, offering unparalleled spatial precision and advancing our understanding of cellular architecture, function, and disease mechanisms.
Project description:Photo-isolation chemistry (PIC) enables isolation of transcriptome information from locally defined areas by photo-irradiation. Here, we present an optimized PIC protocol for formalin-fixed frozen and paraffin mouse sections and fresh-frozen mouse sections. We describe tissue section preparation and permeabilization, followed by in situ reverse transcription using photo-caged primers. We then detail immunostaining and UV-mediated uncaging to the target areas, followed by linear amplification of uncaged cDNAs, library preparation, and quantification. This protocol can be applied to various animal tissue types. For complete details on the use and execution of this protocol, please refer to Honda et al. (2021).
Project description:Optical microscopy has so far been restricted to superficial layers, leaving many important biological questions unanswered. Random scattering causes the ballistic focus, which is conventionally used for image formation, to decay exponentially with depth. Optical imaging beyond the ballistic regime has been demonstrated by hybrid techniques that combine light with the deeper penetration capability of sound waves. Deep inside highly scattering media, the sound focus dimensions restrict the imaging resolutions. Here we show that by iteratively focusing light into an ultrasound focus via phase conjugation, we can fundamentally overcome this resolution barrier in deep tissues and at the same time increase the focus to background ratio. We demonstrate fluorescence microscopy beyond the ballistic regime of light with a threefold improved resolution and a fivefold increase in contrast. This development opens up practical high resolution fluorescence imaging in deep tissues.
Project description:The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr . To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.