ABSTRACT: Purpose: We used RNA deep sequencing to characterize KSHV expression in a large collection of KS biopsies (n=41) from HIV-infected Ugandans. Using a novel approach to quantitate expression in complex genomes like KSHV, we found that RNA from a single KSHV promoter within the latency region constituted the majority of KSHV transcripts in the KS tumors. Alternate RNA processing produced different spliced and un-spliced transcripts with different coding potentials. Differential expression of other KSHV genes was detected which segregated the tumors into three different types depending on their expression of lytic or latency genes. Quantitative analysis of KSHV expression in KS tumors provides an important basis for future studies on the role of KSHV in the development of KS Methods: Total nucleic acids were extracted using RLT buffer (Qiagen) and then RNA was isolated using the RNeasy mini kit with DNAse treatment step. Total RNA integrity was checked using an Agilent 2200 TapeStation (Agilent Technologies, Inc., Santa Clara, CA) and quantified using a Trinean DropSense96 spectrophotometer (Caliper Life Sciences, Hopkinton, MA). Unstranded RNA-seq libraries were prepared from 300 ng of total RNA using the TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., San Diego, CA, USA). Four KS tumors libraries were prepared using the TruSeq Stranded mRNA Library Kit (Illumina) from 100 ng of total RNA. Library size distributions were validated using an Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA). Additional library QC, pooling of indexed libraries, and cluster optimization was performed using Life Technologies’ Invitrogen Qubit® 2.0 Fluorometer (Life Technologies-Invitrogen, Carlsbad, CA, USA). The unstranded RNA-seq libraries were pooled (5-plex) and the stranded libraries were pooled (4-plex) and each pool was clustered onto a flow cell lane. Sequencing was performed using an Illumina HiSeq 2500 in “High Output” mode with a paired-end, 50 base reads (PE50) sequencing strategy for the unstranded libraries and non-paired end for the stranded libraries. Image analysis and base calling was performed using Illumina's Real Time Analysis v1.18 software, followed by 'demultiplexing' of indexed reads and generation of FASTQ files, using Illumina's bcl2fastq Conversion Software v1.8.2. Analysis: Low-quality reads were removed prior to alignment to the KSHV (aka HHV-8) reference sequence (accession number NC_009333 for the KSHV GK18 strain) using TopHat2 (version 2.0.14) in a local instance of Galaxy. For quantitation purposes, the reads from paired-end libraries (non-stranded) were analyzed as unpaired (single-end data) to allow each read of the pair to map unambiguously to a single gene feature. The reads from both strands of the stranded libraries (non-paired end) were either concatenated and analyzed together (librarytype=unstranded) for comparison to the non-stranded paired end libraries or were analyzed separately (librarytype=FR) to show strand specificity. TopHat2 was used to detect splicing events ab initio. The default presets were used except that the maximum intron length was decreased to 10,000 and the maximum number of alignments allowed was decreased from 20 to 1, to avoid overcounting reads to repetitive regions. HTSeq (version 0.6) was used to quantitate the reads mapping to the unique set of UCDS gene features within the novel revised gene feature file "KSHV NC_009333 UCDS ver 121715.gff". The “intersection (non-empty)” setting in HTSeq was used to count all reads mapping completely or partially to a UCDS feature to maximize read count (featuretype=UCDS; IDattribute=gene). No reads were eliminated by ambiguity since the UCDS features were 50 bp apart, the length of a read. The read count was expressed as transcripts per million (TPM) by first normalizing the read count to reads per kilobase (RPK) by dividing the read counts by the length of the UCDS gene feature, in kilobases. The “per million” scaling factor was determined by summing all of the RPK values in a sample and dividing by 1,000,000. Each RPK value was then divided by the “per million” scaling factor to give TPM of mapped KSHV reads. Hierarchical clustering of TPM normalized expression levels was performed using the algorithm implemented in CIMminer. Hierarchical clustering of the gene correlation matrix was performed by calculating the Pearson correlation between the normalized transcript levels (TPM) associated with each pair of UCDS gene features, using a script in R to create and output the correlation matrix. Shiny web applications were developed for R-based principal component analysis (available at https://efg-ds.shinyapps.io/pcaApp/) and boxplot analysis of gene expression levels (available at https://efg-ds.shinyapps.io/boxplotApp/). Conclusions: We have used RNAseq to analyze and quantitate KSHV gene expression in a large collection of 41 KS tumor biopsies from HIV-infected individuals in Uganda that were naïve to ART. The RNAseq libraries were sequenced to an average depth of 100 million reads yielding up to 159,000 KSHV-mapped reads, and RNA reads mapping to non-overlapping UCDS features were quantitated using the new UCDS gene feature file. Phylogenetic analysis of the complete ORF75 sequence revealed the presence of at least six different KSHV strains in these tumor samples. We identified a set of transcripts from the latency region at the right end of the genome that was highly and consistently expressed in all the KS tumors. Variable expression of genes involved with transcription regulation and immune modulation was observed, with minimal expression of genes involved with viral replication, virion structure, or assembly. No correlation was observed between the KSHV transcript patterns detected in macular, nodular or fungating KS lesions. In contrast, the transcript patterns of different tumor lesions from the same individual showed the most similarity regardless of the tumor morphotype.