Project description:Combining in-cell RT and Microwell-seq 1.0, we established Microwell-seq 2.0 for cost-effective HTS with scRNA-seq-based phenotyping. To assess the fidelity of the Microwell-seq 2.0, we performed a species-mixing experiment with cultured human (293T) and mouse (3T3) cells. Moreover, we assessed the platform on tissue cells with more heterogeneous cell types. By harnessing the power of Microwell-seq 2.0, we analyzed massively multiplexed chemical perturbation of human embryonic stem cells (hESCs) at the single-cell resolution.
Project description:RNA structure is a primary determinant of its function, and methods that merge chemical probing with next generation sequencing have created breakthroughs in the throughput and scale of RNA structure characterization. However, little work has been done to examine the effects of library preparation and sequencing on the measured chemical probe reactivities that encode RNA structural information. Here, we present the first analysis and optimization of these effects for selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). We first optimize SHAPE-Seq, and show that it provides highly reproducible reactivity data over a wide range of RNA structural contexts with no apparent biases. As part of this optimization, we present SHAPE-Seq v2.0, a 'universal' method that can obtain reactivity information for every nucleotide of an RNA without having to use or introduce a specific reverse transcriptase priming site within the RNA. We show that SHAPE-Seq v2.0 is highly reproducible, with reactivity data that can be used as constraints in RNA folding algorithms to predict structures on par with those generated using data from other SHAPE methods. We anticipate SHAPE-Seq v2.0 to be broadly applicable to understanding the RNA sequence-structure relationship at the heart of some of life's most fundamental processes.
Project description:High throughput mRNA expression profiling can be used to characterize the response of cell culture models to perturbations such as pharmacologic modulators and genetic perturbations. As profiling campaigns expand in scope, it is important to homogenize, summarize, and analyze the resulting data in a manner that captures significant biological signals in spite of various noise sources such as batch effects and stochastic variation. We used the L1000 platform for large-scale profiling of 978 representative genes across thousands of compound treatments. Here, a method is described that uses deep learning techniques to convert the expression changes of the landmark genes into a perturbation barcode that reveals important features of the underlying data, performing better than the raw data in revealing important biological insights. The barcode captures compound structure and target information, and predicts a compound's high throughput screening promiscuity, to a higher degree than the original data measurements, indicating that the approach uncovers underlying factors of the expression data that are otherwise entangled or masked by noise. Furthermore, we demonstrate that visualizations derived from the perturbation barcode can be used to more sensitively assign functions to unknown compounds through a guilt-by-association approach, which we use to predict and experimentally validate the activity of compounds on the MAPK pathway. The demonstrated application of deep metric learning to large-scale chemical genetics projects highlights the utility of this and related approaches to the extraction of insights and testable hypotheses from big, sometimes noisy data.
Project description:The rapid spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is due to the high rates of transmission by individuals who are asymptomatic at the time of transmission1,2. Frequent, widespread testing of the asymptomatic population for SARS-CoV-2 is essential to suppress viral transmission. Despite increases in testing capacity, multiple challenges remain in deploying traditional reverse transcription and quantitative PCR (RT-qPCR) tests at the scale required for population screening of asymptomatic individuals. We have developed SwabSeq, a high-throughput testing platform for SARS-CoV-2 that uses next-generation sequencing as a readout. SwabSeq employs sample-specific molecular barcodes to enable thousands of samples to be combined and simultaneously analyzed for the presence or absence of SARS-CoV-2 in a single run. Importantly, SwabSeq incorporates an in vitro RNA standard that mimics the viral amplicon, but can be distinguished by sequencing. This standard allows for end-point rather than quantitative PCR, improves quantitation, reduces requirements for automation and sample-to-sample normalization, enables purification-free detection, and gives better ability to call true negatives. After setting up SwabSeq in a high-complexity CLIA laboratory, we performed more than 80,000 tests for COVID-19 in less than two months, confirming in a real world setting that SwabSeq inexpensively delivers highly sensitive and specific results at scale, with a turn-around of less than 24 hours. Our clinical laboratory uses SwabSeq to test both nasal and saliva samples without RNA extraction, while maintaining analytical sensitivity comparable to or better than traditional RT-qPCR tests. Moving forward, SwabSeq can rapidly scale up testing to mitigate devastating spread of novel pathogens.
Project description:Broad-scale protein-protein interaction mapping is a major challenge given the cost, time, and sensitivity constraints of existing technologies. Here, we present a massively multiplexed yeast two-hybrid method, CrY2H-seq, which uses a Cre recombinase interaction reporter to intracellularly fuse the coding sequences of two interacting proteins and next-generation DNA sequencing to identify these interactions en masse. We applied CrY2H-seq to investigate sparsely annotated Arabidopsis thaliana transcription factors interactions. By performing ten independent screens testing a total of 36 million binary interaction combinations, and uncovering a network of 8,577 interactions among 1,453 transcription factors, we demonstrate CrY2H-seq's improved screening capacity, efficiency, and sensitivity over those of existing technologies. The deep-coverage network resource we call AtTFIN-1 recapitulates one-third of previously reported interactions derived from diverse methods, expands the number of known plant transcription factor interactions by three-fold, and reveals previously unknown family-specific interaction module associations with plant reproductive development, root architecture, and circadian coordination.
Project description:High throughput sequencing techniques are poorly adapted for in vivo studies of parasites, which require prior in vitro culturing and purification. Trypanosomatids, a group of kinetoplastid protozoans, possess a distinctive feature in their transcriptional mechanism whereby a specific Spliced Leader (SL) sequence is added to the 5'end of each mRNA by trans-splicing. This allows to discriminate Trypansomatid RNA from mammalian RNA and forms the basis of our new multiplexed protocol for high-throughput, selective RNA-sequencing called SL-seq. We provided a proof-of-concept of SL-seq in Leishmania donovani, the main causative agent of visceral leishmaniasis in humans, and successfully applied the method to sequence Leishmania mRNA directly from infected macrophages and from highly diluted mixes with human RNA. mRNA profiles obtained with SL-seq corresponded largely to those obtained from conventional poly-A tail purification methods, indicating both enumerate the same mRNA pool. However, SL-seq offers additional advantages, including lower sequencing depth requirements, fast and simple library prep and high resolution splice site detection. SL-seq is therefore ideal for fast and massive parallel sequencing of parasite transcriptomes directly from host tissues. Since SLs are also present in Nematodes, Cnidaria and primitive chordates, this method could also have high potential for transcriptomics studies in other organisms.
Project description:The functions of RNA molecules are intimately linked to their ability to fold into complex secondary and tertiary structures. Thus, understanding how these molecules fold is essential to determining how they function. Current methods for investigating RNA structure often use small molecules, enzymes, or ions that cleave or modify the RNA in a solvent-accessible manner. While these methods have been invaluable to understanding RNA structure, they can be fairly labor intensive and often focus on short regions of single RNAs. Here we present a new method (Mod-seq) and data analysis pipeline (Mod-seeker) for assaying the structure of RNAs by high-throughput sequencing. This technique can be utilized both in vivo and in vitro, with any small molecule that modifies RNA and consequently impedes reverse transcriptase. As proof-of-principle, we used dimethyl sulfate (DMS) to probe the in vivo structure of total cellular RNAs in Saccharomyces cerevisiae. Mod-seq analysis simultaneously revealed secondary structural information for all four ribosomal RNAs and 32 additional noncoding RNAs. We further show that Mod-seq can be used to detect structural changes in 5.8S and 25S rRNAs in the absence of ribosomal protein L26, correctly identifying its binding site on the ribosome. While this method is applicable to RNAs of any length, its high-throughput nature makes Mod-seq ideal for studying long RNAs and complex RNA mixtures.
Project description:Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities.
Project description:Multiplexed assays allow functional testing of large synthetic libraries of genetic elements, but are limited by the designability, length, fidelity and scale of the input DNA. Here, we improve DropSynth, a low-cost, multiplexed method that builds gene libraries by compartmentalizing and assembling microarray-derived oligonucleotides in vortexed emulsions. By optimizing enzyme choice, adding enzymatic error correction and increasing scale, we show that DropSynth can build thousands of gene-length fragments at >20% fidelity.