Project description:We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees.
Project description:Clear cell renal cell carcinoma (ccRCC) is the most common form of kidney cancer. Following primary tumour resection approximately 30% of patients experience disease recurrence associated with metastasis. To date, long-read RNA sequencing has not been applied to kidney cancer. Here, we used ONT long-read Direct RNA sequencing to profile the transcriptomes of ccRCC archival tumours, 6 of which were from patients who went on to relapse. Our results revealed a loss of immune infiltrate in tumours of patients who relapse. Moreover, thousands of novel isoforms were discovered, including a novel PD-L1 transcript encoding for the soluble version of the protein but having a longer 3'UTR than the currently annotated transcript. Finally, we have identified a novel non-coding gene that was over-expressed in patients who experience recurrence. Our data shows that DRS can be used in archival tumour samples to comprehensively characterise tumour transcriptomes, and to reveal novel features that would have been missed by short-read RNAseq.
Project description:When applying deconvolution methods to bulk RNAseq data, a limitation of most prior studies is the lack of paired scRNA-seq and bulk RNA-seq data from the same samples to serve as ground truth for deconvolution. Our UC cohort, containing matched scRNA-seq and bulk RNA-seq data therefore provided a unique opportunity. (The scRNAseq datasets have been published before [PMID: 32111252 PMID: 36129800 PMID: 36099881 PMID: 33837006 ]. Please see linked manuscript for details) We assembled a scRNA-seq dataset of 100,667 cells from 30 UC tissue samples (20 unique patients). Bulk RNA sequencing was performed on a subset of patients(14) in the single-cell RNA sequencing cohort due to tissue availability.