Project description:BackgroundThe multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing.ResultsIn MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells.ConclusionsOur findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.
Project description:Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Project description:Duck (Anas platyrhynchos), one of the most economically important waterfowl, is an ideal model for studying the immune protection mechanism of birds. An incomplete duck reference genome and very limited availability of full-length cDNAs has hindered the identification of alternatively spliced transcripts and slowed down many basic studies in ducks. We applied PacBio Iso-Seq technologies to multiple tissues from duck for use in transcriptome sequencing. We obtained 199,993 full-length transcripts and comprehensively annotated these transcripts. 23,755 lncRNAs were predicted from all identified transcripts and 35,031 alternative splicing events, which divided into 5 models, were accurately predicted from 3,346 genes. Our data constitute a large increase in the known number of both lncRNA, and alternatively spliced transcripts of duck and plays an important role in improving current genome annotation. In addition, the data will be extremely useful for functional studies in other birds.
Project description:The ontogeny and dynamics of mtDNA heteroplasmy remain unclear due to limitations of current mtDNA sequencing methods. We developed individual Mitochondrial Genome sequencing (iMiGseq) of full-length mtDNA for ultra-sensitive variant detection, complete haplotyping, and unbiased evaluation of heteroplasmy levels, all at the individual mtDNA molecule level. iMiGseq uncovered unappreciated levels of heteroplasmic variants in single cells well below the conventional NGS detection limit and provided accurate quantitation of heteroplasmy level. iMiGseq resolved the complete haplotype of individual mtDNA in single oocytes and revealed genetic linkage of de novo mutations. iMiGseq detected sequential acquisition of detrimental mutations, including large deletions, in defective mtDNA in NARP/Leigh syndrome patient-derived induced pluripotent stem cells. iMiGseq identified unintended heteroplasmy shifts in mitoTALEN editing, while showing no appreciable level of unintended mutations in DdCBE-mediated mtDNA base editing. Therefore, iMiGseq could not only help elucidate the mitochondrial etiology of diseases, but also evaluate the safety of various mtDNA editing strategies.
Project description:The high-throughput department of DKMS Life Science Lab encounters novel human leukocyte antigen (HLA) alleles on a daily basis. To characterise these alleles, we have developed a system to sequence the whole gene from 5'- to 3'-UTR for the HLA loci A, B, C, DQB1 and DPB1 for submission to the European Molecular Biology Laboratory - European Nucleotide Archive (EMBL-ENA) and the IPD-IMGT/HLA Database. Our workflow is based on a dual redundant sequencing strategy. Using shotgun sequencing on an Illumina MiSeq instrument and single molecule real-time (SMRT) sequencing on a PacBio RS II instrument, we are able to achieve highly accurate HLA full-length consensus sequences. Remaining conflicts are resolved using the R package DR2S (Dual Redundant Reference Sequencing). Given the relatively high throughput of this strategy, we have developed the semi-automated web service TypeLoader, to aid in the submission of sequences to the EMBL-ENA and the IPD-IMGT/HLA Database. In the IPD-IMGT/HLA Database release 3.24.0 (April 2016; prior to the submission of the sequences described here), only 5.2% of all known HLA alleles have been fully characterised together with intronic and UTR sequences. So far, we have applied our strategy to characterise and submit 1056 HLA alleles, thereby more than doubling the number of fully characterised alleles. Given the increasing application of next generation sequencing (NGS) for full gene characterisation in clinical practice, extending the HLA database concomitantly is highly desirable. Therefore, we propose this dual redundant sequencing strategy as a workflow for submission of novel full-length alleles and characterisation of sequences that are as yet incomplete. This would help to mitigate the predominance of partially known alleles in the database.
Project description:Sequencing the entire RNA molecule leads to a better understanding of the transcriptome architecture. SMARTer (Switching Mechanism at 5'-End of RNA Template) is a technology aimed at generating full-length cDNA from low amounts of mRNA for sequencing by short-read sequencers such as those from Illumina. However, short read sequencing such as Illumina technology includes fragmentation that results in bias and information loss. Here, we built a pipeline, UNAGI or UNAnnotated Gene Identifier, to process long reads obtained with nanopore sequencing and compared this pipeline with the standard Illumina pipeline by studying the Saccharomyces cerevisiae transcriptome in full-length cDNA samples generated from two different biological samples: haploid and diploid cells. Additionally, we processed the long reads with another long read tool, FLAIR. Our strand-aware method revealed significant differential gene expression that was masked in Illumina data by antisense transcripts. Our pipeline, UNAGI, outperformed the Illumina pipeline and FLAIR in transcript reconstruction (sensitivity and specificity of 80% and 40% vs. 18% and 34% and 79% and 32%, respectively). Moreover, UNAGI discovered 3877 unannotated transcripts including 1282 intergenic transcripts while the Illumina pipeline discovered only 238 unannotated transcripts. For isoforms profiling, UNAGI also outperformed the Illumina pipeline and FLAIR in terms of sensitivity (91% vs. 82% and 63%, respectively). But the low accuracy of nanopore sequencing led to a closer gap in terms of specificity with Illumina pipeline (70% vs. 63%) and to a huge gap with FLAIR (70% vs 0.02%).
Project description:To investigate the role of oxidative stress on RNA dynamics, we treatd HeLa cell lines with 500 µM of arsenite. We then explored the role of XRN1, CHX and ISRIB on mRNA shrotening using the data obtained from dRNA seq
Project description:Adult T-cell lymphoma/leukemia (ATL) is a rare T-cell lymphoproliferative neoplasm caused by human T-lymphotrophic virus 1. In its more common, aggressive forms, ATL carries one of the poorest prognoses of the non-Hodgkin lymphomas. The disease has clinical subtypes (ie, acute, lymphoma, chronic, and smoldering forms) defined by the presenting features, and therefore, the clinical course can vary. For the smoldering and lower-risk chronic forms, combinations involving antiviral therapies have shown some success. However, in many patients, the more indolent forms will evolve into the more aggressive subtypes. In the more aggressive acute, lymphoma, and higher-risk chronic forms, the literature supports initial treatment with combination chemotherapy followed by allogeneic transplantation as a potentially curative approach. Recently, mogamulizumab and lenalidomide have shown promise in the treatment of ATL. With better understanding of the molecular drivers of this disease, we hope that the therapeutic landscape will continue to expand.