Project description:Genetic variants that impact gene regulation are important contributors to human phenotypic variation. For this reason, considerable efforts have been made to identify genetic associations with differences in mRNA levels of nearby genes, namely, cis expression quantitative trait loci (eQTLs). The phenotypic consequences of eQTLs are presumably due, in most cases, to their ultimate effects on protein expression levels. Yet, only few studies have quantified the impact of genetic variation on proteins levels directly. It remains unclear how faithfully eQTLs are reflected at the protein level, and whether there is a significant layer of cis regulatory variation acting primarily on translation or steady state protein levels. To address these questions, we measured ribosome occupancy by high-throughput sequencing, and relative protein levels by high-resolution quantitative mass spectrometry, in a panel of lymphoblastoid cell lines (LCLs) in which we had previously measured transcript expression using RNA sequencing. We then mapped genetic variants that are associated with changes in transcript expression (eQTLs), ribosome occupancy (rQTLs), or protein abundance (pQTLs). Most of the QTLs we detected are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, we found that eQTLs tend to have significantly reduced effect sizes on protein levels, suggesting that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we confirmed the presence of a class of cis QTLs that specifically affect protein abundance with little or no effect on mRNA levels; most of these QTLs have little effect on ribosome occupancy, and hence may arise from differences in post-translational regulation.
Project description:<p>We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts such as misalignments or random pairing of transcript fragments and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including 8 cancers with and without known rearrangements.</p>
Project description:This SuperSeries is composed of the SubSeries listed below. MicroRNAs are predicted to regulate the expression of more than 60% of mammalian genes and play fundamental roles in most biological processes. Deregulation of miRNA expression is a hallmark of most cancers and further investigation of mechanisms controlling miRNA biogenesis is needed. The dsRNA-binding protein, NF90 has been shown to act as a competitor of Microprocessor for a limited number of pri-miRNAs. Here, we show that NF90 has a more widespread effect on pri-miRNA biogenesis than previously thought. Genome-wide approaches revealed that NF90 is associated with the stem region of 38 pri-miRNAs, in a manner that is largely exclusive of Microprocessor. Following loss of NF90, 25 NF90-bound pri-miRNAs showed increased abundance of mature miRNA products. NF90-targeted pri-miRNAs are highly stable, having a lower free energy and fewer mismatches compared to all pri-miRNAs. Mutations leading to less stable structures reduced NF90 binding while increasing pri-miRNA stability led to ac quisition of NF90 association, as determined by RNA EMSA. NF90-bound and modulated pri-miRNAs are embedded in introns of host genes and expression of several is concomitantly modulated, including an oncogene implicated in metastasis of hepatocellular carcinoma, TIAM2. These data suggest that NF90 controls the processing of a subset of highly stable, intronic miRNAs.
Project description:RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a non-parametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a several fold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper.
Project description:RNA-seq was used to generate an extensive map of the Drosophila melanogaster transcriptome by broad sampling of 10 developmental stages. In total, 142.2 million uniquely mapped 64-100-bp paired-end reads were generated on the Illumina GA II yielding 356× sequencing coverage. More than 95% of FlyBase genes and 90% of splicing junctions were observed. Modifications to 30% of FlyBase gene models were made by extension of untranslated regions, inclusion of novel exons, and identification of novel splicing events. A total of 319 novel transcripts were identified, representing a 2% increase over the current annotation. Alternate splicing was observed in 31% of D. melanogaster genes, a 38% increase over previous estimations, but significantly less than that observed in higher organisms. Much of this splicing is subtle such as tandem alternate splice sites.
Project description:To investigate the effect of Tup1, Xbp1, Isw2, and Sds3 on transcription we generated knockout strains and measured their transcript abundance compared to wild type during diauxic shift and stationary phase.
Project description:MotivationThe RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment.MethodsA probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment.ResultsThe method was applied to 2 x 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).AvailabilitySoftware available at http://www.netlab.uky.edu/p/bioinfo/MapSplice/PER.