Project description:Droplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct both short-read and long-read sequencing, thereby allowing users to recover more reads per cell that permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and multiple myeloma cell lines to evaluate differential isoform usage and Ewing’s sarcoma cells to demonstrate Ig fusion transcript analysis.
Project description:Sample index hopping refers to the incorrect sample assignment of a demultiplexed sequencing read in a library pool. To enable benchmarking of methods for measurement of index hopping rate and removal of its artifacts in single-cell RNA-seq data, we developed a validation dataset consisting of a multiplexed library of two samples, in which the true sample of origin of most reads are known. The reads with known sample of origin provide the ground truth for measuring the performance of index hopping correcting methods.
Project description:Affinity capture of DNA methylation combined with high-throughput sequencing strikes a good balance between the high cost of whole genome bisulfite sequencing and the low coverage of methylation arrays. We present BayMeth, an empirical Bayes approach that uses a fully methylated control sample to transform observed read counts into regional methylation levels. In our model, inefficient capture can readily be distinguished from low methylation levels. BayMeth improves on existing methods, allows explicit modeling of copy number variation, and offers computationally-efficient analytical mean and variance estimators. BayMeth is available in the Repitools Bioconductor package. Benchmarking samples to compare MBD- and MeDIP-seq [GSE38679, GSE24546; PMID 21045081] datasets against 450k measurements
Project description:Objectives: To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods: PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results: Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion: Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.
Project description:Transposon insertion site sequencing (TIS) is a powerful method for associating genotype to phenotype. However, all TIS methods described to date use short nucleotide sequence reads which cannot uniquely determine the locations of transposon insertions within repeating genomic sequences where the repeat units are longer than the sequence read length. To overcome this limitation, we have developed a TIS method using Oxford Nanopore sequencing technology that generates and uses long nucleotide sequence reads; we have called this method LoRTIS (Long Read Transposon Insertion-site Sequencing). This experiment data contains sequence files generated using Nanopore and Illumina platforms. Biotin1308.fastq.gz and Biotin2508.fastq.gz are fastq files generated from nanopore technology. Rep1-Tn.fastq.gz and Rep1-Tn.fastq.gz are fastq files generated using Illumina platform. In this study, we have compared the efficiency of two methods in identification of transposon insertion sites.
Project description:Deconvolution methods infer quantitative cell type estimates from bulk measurement of mixed samples including blood and tissue. DNA methylation sequencing measures multiple CpGs per read, but few existing deconvolution methods leverage this within-read information. We develop CelFiE-ISH, which extends an existing method (CelFiE) to use within-read haplotype information. CelFiE-ISH outperforms CelFiE and other existing methods, achieving 30% better accuracy and more sensitive detection of rare cell types. We also demonstrate the importance of marker selection and tailoring markers for haplotype-aware methods. While here we use gold-standard short-read sequencing data, haplotype-aware methods will be well-suited for long-read sequencing.