Project description:High-throughput RNA-sequencing has now become the gold standard method for whole-transcriptome gene expression analysis. It is widely used in a number of applications studying various transcriptomes of cells and tissues. It is also being increasingly considered for a number of clinical applications, including expression profiling for diagnostics or alternative transcripts detection. However, RNA sequencing can be challenging in some situations, for instance due to low input quantities or degraded RNA samples. Several protocols have been proposed to overcome some of these challenges, and many are available as commercial kits. Here we perform a comprehensive testing of three recent commercial technologies for RNA-seq library preparation (Truseq, Smarter and Smarter Ultra-Low) on human reference tissue preparations, for standard (1ug), low (100 and 10 ng) and ultra-low (< 1 ng) input quantities, and for mRNA and total RNA, stranded or unstranded. We analyze the results using read quality and alignments metrics, gene detection and differential gene expression metrics. Overall, we show that the Truseq kit performs well at 100 ng input quantity, while the Smarter kit shows degraded performances for 100 and 10 ng input quantities, and that the Smarter Ultra-Low kit performs quite well for input quantities < 1 ng. All the results are discussed in details, and we provide guidelines for the selection of a RNA-seq library preparation kits by biologists.
Project description:<p>Recently developed methods that utilize partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, have succeeded in retaining long-range information in short sequencing reads. These so-called read cloud approaches represent a powerful, accurate, and cost-effective alternative to single-molecule long-read sequencing. We developed software, GROC-SVs, that takes advantage of read clouds for structural variant detection and assembly. We apply the method to two 10x Genomics data sets, one chromothriptic sarcoma with several spatially separated samples, and one breast cancer cell line, all Illumina-sequenced to high coverage. Comparison to short-fragment data from the same samples, and validation by mate-pair data from a subset of the sarcoma samples, demonstrate substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The embedded long-range information also facilitates sequence assembly of a large fraction of the breakpoints; importantly, consecutive breakpoints that are closer than the average length of the input DNA molecules can be assembled together and their order and arrangement reconstructed, with some events exhibiting remarkable complexity. These features facilitated an analysis of the structural evolution of the sarcoma. In the chromothripsis, rearrangements occurred before copy number amplifications, and using the phylogenetic tree built from point mutation data, we show that single nucleotide variants and structural variants are not correlated. We predict significant future advances in structural variant science using 10x data analyzed with GROC-SVs and other read cloud-specific methods.</p>
Project description:Recent studies have demonstrated that the non-coding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (Long-Read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE-derived and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we demonstrated that long-read technology significantly improves mapping of promoters with low mappability scores and LRCAGE guarantees accurate construction of uncharacterized 5’ transcript structure. Unannotated peptides predicted from newly characterized transcripts were readily detectable in whole cell lysate mass-spectrometry data. Incorporating unannotated peptides into the proteome database enabled us to detect non-canonical antigens in HLA-pulldown LC-MS/MS data. At last, we showed that epigenetic treatment increased the number of non-canonical antigens, particularly those encoded by TE-derived transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.
Project description:RNA-seq is the standard method for profiling gene expression in many biological systems. Due to the wide dynamic range and complex nature of the transcriptome, RNA-seq provides an incomplete characterisation, especially of lowly expressed genes and transcripts. Targeted RNA sequencing (RNA CaptureSeq) focuses sequencing on genes of interest, providing exquisite sensitivity for transcript detection and quantification. However, uses of CaptureSeq have focused on bulk samples and its performance on very small populations of cells is unknown. Here we show CaptureSeq greatly enhances transcriptomic profiling of target genes in ultra-low-input samples and provides equivalent performance to that on bulk samples. We validate the performance of CaptureSeq using multiple probe sets on samples of iPSC-derived cortical neurons. We demonstrate up to 275-fold enrichment for target genes, the detection of 10% additional genes and a greater than 5-fold increase in identified gene isoforms. Analysis of spike-in controls demonstrated CaptureSeq improved both detection sensitivity and expression quantification. Comparison to the CORTECON database of cerebral cortex development revealed CaptureSeq enhanced the identification of sample differentiation stage. CaptureSeq provides sensitive, reliable and quantitative expression measurements on hundreds-to-thousands of target genes from ultra-low-input samples and has the potential to greatly enhance transcriptomic profiling when samples are limiting.
Project description:Current methods for detection of copy number aberrations (CNA) from whole-exome sequencing (WES) data are based on the read counts of the captured exons only. However, accurate CNA determination is complicated by the non-uniform read depth and uneven distribution of exons. Therefore, we developed ENCODER (ENhanced COpy number Detection from Exome Reads), which eludes these problems. By exploiting the ‘off-target’ sequence reads, it allows for creation of robust copy number profiles from WES. The accuracy of ENCODER compares to approaches specifically designed for copy number detection, and outperforms current exon-based WES methods, particularly in samples of low quality. Current methods for detection of copy number aberrations (CNA) from whole-exome sequencing (WES) data are based on the read counts of the captured exons only. However, accurate CNA determination is complicated by the non-uniform read depth and uneven distribution of exons. Therefore, we developed ENCODER (ENhanced COpy number Detection from Exome Reads), which eludes these problems. By exploiting the ‘off-target’ sequence reads, it allows for creation of robust copy number profiles from WES. The accuracy of ENCODER compares to approaches specifically designed for copy number detection, and outperforms current exon-based WES methods, particularly in samples of low quality. Current methods for detection of copy number aberrations (CNA) from whole-exome sequencing (WES) data are based on the read counts of the captured exons only. However, accurate CNA determination is complicated by the non-uniform read depth and uneven distribution of exons. Therefore, we developed ENCODER (ENhanced COpy number Detection from Exome Reads), which eludes these problems. By exploiting the ‘off-target’ sequence reads, it allows for creation of robust copy number profiles from WES. The accuracy of ENCODER compares to approaches specifically designed for copy number detection, and outperforms current exon-based WES methods, particularly in samples of low quality. DNA copy number profiles generated with a new tool, ENCODER, were compared to DNA copy number profiles from SNP6, NimbleGen and low-coverage Whole Genome Sequencing.
Project description:Current methods for detection of copy number aberrations (CNA) from whole-exome sequencing (WES) data are based on the read counts of the captured exons only. However, accurate CNA determination is complicated by the non-uniform read depth and uneven distribution of exons. Therefore, we developed ENCODER (ENhanced COpy number Detection from Exome Reads), which eludes these problems. By exploiting the ‘off-target’ sequence reads, it allows for creation of robust copy number profiles from WES. The accuracy of ENCODER compares to approaches specifically designed for copy number detection, and outperforms current exon-based WES methods, particularly in samples of low quality. Current methods for detection of copy number aberrations (CNA) from whole-exome sequencing (WES) data are based on the read counts of the captured exons only. However, accurate CNA determination is complicated by the non-uniform read depth and uneven distribution of exons. Therefore, we developed ENCODER (ENhanced COpy number Detection from Exome Reads), which eludes these problems. By exploiting the ‘off-target’ sequence reads, it allows for creation of robust copy number profiles from WES. The accuracy of ENCODER compares to approaches specifically designed for copy number detection, and outperforms current exon-based WES methods, particularly in samples of low quality. Current methods for detection of copy number aberrations (CNA) from whole-exome sequencing (WES) data are based on the read counts of the captured exons only. However, accurate CNA determination is complicated by the non-uniform read depth and uneven distribution of exons. Therefore, we developed ENCODER (ENhanced COpy number Detection from Exome Reads), which eludes these problems. By exploiting the ‘off-target’ sequence reads, it allows for creation of robust copy number profiles from WES. The accuracy of ENCODER compares to approaches specifically designed for copy number detection, and outperforms current exon-based WES methods, particularly in samples of low quality. DNA copy number profiles generated with a new tool, ENCODER, were compared to DNA copy number profiles from SNP6, NimbleGen and low-coverage Whole Genome Sequencing.
Project description:Deeper understanding of T cell biology is crucial for the development of new therapeutics. Human naïve T cells have low RNA content and their numbers can be limiting; therefore we set out to determine the parameters for robust ultra-low input RNA sequencing. We performed transcriptome profiling at different cell inputs and compared three protocols: Switching Mechanism at 5’ End of RNA Template technology (SMART) with two different library preparation methods (Nextera (SMART_Nxt) and Clontech (SMART_CC)), and AmpliSeq technology. As the cell input decreased the number of detected coding genes decreased with SMART, while stayed constant with AmpliSeq. However, SMART enables detection of non-coding genes, which is not feasible for AmpliSeq. The detection is dependent on gene abundance, but not transcript length. The consistency between technical replicates and cell inputs was comparable across methods above 1K but highly variable at 100 cell input. Sensitivity of detection for differentially expressed genes decreased dramatically with decreased cell inputs in all protocols, support that additional approaches, such as pathway enrichment, are important for data interpretation at ultra-low input. Finally, T cell activation signature was detected at 1K cell input and above in all protocols, with AmpliSeq showing better detection at 100 cells.
Project description:Here, we report an enrichment-based ultra-low input cfDNA methylation profiling method using methyl-CpG binding proteins capture, termed cfMBD-seq. We optimized the conditions of cfMBD capture by adjusting the amount of MethylCap protein along with using methylated filler DNA. Our data showed that cfMBD-seq performs equally to the standard MBD-seq (>1000 ng input) even when using 1 ng DNA as the input. cfMBD-seq demonstrated equivalent sequencing data quality as well as similar methylation profile when compared to cfMeDIP-seq. We showed that cfMBD-seq outperforms cfMeDIP-seq in the enrichment of CpG islands. This new bisulfite-free ultra-low input methylation profiling technology has a great potential in non-invasive and cost-effective cancer detection and classification.