Project description:Functional characterization of pseudouridine (Ψ) in mammalian messenger RNA (mRNA) has been hampered by the lack of a quantitative method that maps pseudouridine in the whole transcriptome. We report bisulfite-induced deletion sequencing (BID-seq) that utilizes a bisulfite-mediated reaction to stoichiometrically convert pseudouridine into deletion upon reverse transcription. BID-seq enabled detection of abundant pseudouridine sites with stoichiometry information in several human cell lines and 12 different mouse tissues using 10-20 ng input RNA. We uncovered consensus sequences for pseudouridine in mammalian mRNA and assigned different ‘writer’ proteins to individual pseudouridine deposition. Our results reveal a transcript stabilization role of Ψ sites installed by TRUB1 in human cancer cells. We also detected presence of Ψ within stop codons of mammalian mRNA, and confirmed the role of Ψ in promoting stop codon read-through in vivo. This new method for sensitive and comprehensive detection of Ψ sets the stage for future investigations of the roles of Ψ in diverse biological processes.
Project description:To identify aberrant splicing isoforms and potential neoantigens, we performed full-length cDNA sequencing of lung adenocarcinoma cell lines using a long-read sequencer MinION. We constructed a comprehensive catalog of aberrant splicing isoforms and detected isoform-specific peptides using proteome analysis.
Project description:<p>Recently developed methods that utilize partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, have succeeded in retaining long-range information in short sequencing reads. These so-called read cloud approaches represent a powerful, accurate, and cost-effective alternative to single-molecule long-read sequencing. We developed software, GROC-SVs, that takes advantage of read clouds for structural variant detection and assembly. We apply the method to two 10x Genomics data sets, one chromothriptic sarcoma with several spatially separated samples, and one breast cancer cell line, all Illumina-sequenced to high coverage. Comparison to short-fragment data from the same samples, and validation by mate-pair data from a subset of the sarcoma samples, demonstrate substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The embedded long-range information also facilitates sequence assembly of a large fraction of the breakpoints; importantly, consecutive breakpoints that are closer than the average length of the input DNA molecules can be assembled together and their order and arrangement reconstructed, with some events exhibiting remarkable complexity. These features facilitated an analysis of the structural evolution of the sarcoma. In the chromothripsis, rearrangements occurred before copy number amplifications, and using the phylogenetic tree built from point mutation data, we show that single nucleotide variants and structural variants are not correlated. We predict significant future advances in structural variant science using 10x data analyzed with GROC-SVs and other read cloud-specific methods.</p>
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Droplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct both short-read and long-read sequencing, thereby allowing users to recover more reads per cell that permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and multiple myeloma cell lines to evaluate differential isoform usage and Ewing’s sarcoma cells to demonstrate Ig fusion transcript analysis.
Project description:Deconvolution methods infer quantitative cell type estimates from bulk measurement of mixed samples including blood and tissue. DNA methylation sequencing measures multiple CpGs per read, but few existing deconvolution methods leverage this within-read information. We develop CelFiE-ISH, which extends an existing method (CelFiE) to use within-read haplotype information. CelFiE-ISH outperforms CelFiE and other existing methods, achieving 30% better accuracy and more sensitive detection of rare cell types. We also demonstrate the importance of marker selection and tailoring markers for haplotype-aware methods. While here we use gold-standard short-read sequencing data, haplotype-aware methods will be well-suited for long-read sequencing.