Project description:Recent advances in chromatin architecture profiling technologies, such as single-cell Hi-C (scHi-C), allow us to dissect the heterogeneity of chromosome higher-order structures across diverse cell states and different individuals. However, scHi-C experiments are still expensive and not immediately available for population-scale profiling. Here, we present scENCORE, a computational method, to reconstruct personalized and cell-type-specific higher-order chromatin structures, such as A/B compartments, directly from more cost-effective and widely available single-cell epigenetic data (e.g., scATAC-seq). We apply scENCORE on scATAC-seq data from post-mortem prefrontal cortex brains and demonstrate its utility to 1) project Mega-base scale chromatin regions into lower dimensional space by leveraging graph embedding technologies based on cell-type-specific co-variability patterns, 2) define A/B compartments via unsupervised clustering, 3) perform an alignment algorithm for multi-graph embedding to derive comparable chromatin representations and highlight dynamic chromatin compartments across cell states and individuals. Validated by Hi-C experiments using FACS-sorted cells, scENCORE can faithfully reconstruct cell-type-specific chromatin compartments. Furthermore, scENCORE uniformly constructs chromosome conformation across population-scale scATAC-seq data and discovers key 3D structural switching events associated with psychiatric disorders. In summary, scENCORE allows cost-effective cell-type-specific and personalized reconstruction that delineate higher-order chromatin structures.
Project description:We present MultiEditR, the first algorithm specifically designed to detect and quantify RNA editing from Sanger sequencing (z.umn.edu/multieditr). Although RNA editing is routinely evaluated by measuring the heights of peaks from in Sanger sequencing traces, the accuracy and the precision of this approach has yet to be evaluated against gold-standards next-generation sequencing methods. Through a comprehensive comparison to RNA-seq and amplicon based deep sequencing, we show that MultiEditR is accurate, precise, and reliable for detecting endogenous and programmable RNA editing.
Project description:Synthetic lethality (SL) has shown great promise for the discovery of novel targets in cancer. CRISPR double-knockout (CDKO) technologies can only screen several hundred genes and their combinations, but not genome-wide. Therefore, good SL prediction models are highly needed for genes and gene pairs selection in CDKO experiments. In this paper, we develop a novel multi-layer encoder for individual sample-specific SL prediction (MLEC-iSL). Unlike existing SL prediction models, MLEC-iSL is built to predict SL connectivity first. Because SL connectivity is scalable from existing genes in the training data to new genes in validation data, we hypothesize MLEC-iSL has better SL prediction performance. MLEC-iSL has three encoders, namely gene encoder, graph encoder, and transformer encoder. MLEC-iSL has high performance in K562 (AUPR, 0.73; AUC, 0.72) and Jurkat (AUPR, 0.73; AUC, 0.71) cells while no existing methods exceed 0.62 AUPR and AUC in either cell. MLEC-iSL guided CDKO experiment in 22Rv1 cells yielded a 46.8% SL ratio amongst its selected gene pairs. Six of top ten SL connectivity hub genes are validated in 22Rv1 cells. It reveals SL gene pairs and dependency between apoptosis and mitosis cell death pathways.
Project description:HiCUP is a pipeline for processing sequence data generated by Hi-C, a technique used to investigate the three-dimensional organisation of a genome. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also provides an easy-to-interpret yet detailed quality control report that may be used by researchers to refine their experimental protocol for future studies. The software is freely available and has already been used for processing Hi-C data in several recently published peer-reviewed research articles. This experiment investigates the impact of using HiCUP to remove putative PCR amplification products in heavily duplicated Capture Hi-C libraries.
Project description:HiCUP is a pipeline for processing sequence data generated by Hi-C, a technique used to investigate the three-dimensional organisation of a genome. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also provides an easy-to-interpret yet detailed quality control report that may be used by researchers to refine their experimental protocol for future studies. The software is freely available and has already been used for processing Hi-C data in several recently published peer-reviewed research articles. This experiment investigates the impact of using HiCUP to remove putative PCR amplification products in heavily duplicated Capture Hi-C libraries. Examination of three Capture Hi-C libraries