Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.
Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.
Project description:a chromosome-level nuclear genome and organelle genomes of the alpine snow alga Chloromonas typhlos were sequenced and assembled by integrating short- and long-read sequencing and proteogenomic strategy
Project description:Transposable elements (TEs) serve as both insertional mutagens and regulatory elements in cells, and their aberrant activity is increasingly being revealed to contribute to diseases and cancers. However, measuring the transcriptional consequences of nonreference and young TEs at individual loci remains challenging with current methods, primarily due to technical limitations, including short read lengths generated and insufficient coverage in target regions. Here, we introduce a long-read targeted RNA sequencing method, Cas9-assisted profiling TE expression sequencing (capTEs), for quantitative analysis of transcriptional outputs for individual TEs, including transcribed nonreference insertions, noncanonical transcripts from various transcription patterns and their correlations with expression changes in related genes. This method selectively identified TE-containing transcripts and outputted data with up to 90% TE reads, maintaining a comparable data yield to whole-transcriptome sequencing. We applied capTEs to human cancer cells and found that internal and inserted Alu elements may employ distinct regulatory mechanisms to upregulate gene expression.
Project description:Pioneering studies (PXD014844) have identified many interesting molecules in tick saliva by LC-MS/MS proteomics, but the protein databases used to assign mass spectra were based on short Illumina reads of the Amblyomma americanum transcriptome and may not have captured the diversity and complexity of longer transcripts. Here we apply long-read Pacific Bioscience technologies to complement the previously reported short-read Illumina transcriptome-based proteome in an effort to increase spectrum assignments. Our dataset reveals a small increase in assignable spectra to supplement the previously released short-read transcriptome-based proteome.
Project description:Pioneering studies (PXD014844) have identified many interesting molecules by LC-MS/MS proteomics, but the protein databases used to assign mass spectra were based on short Illumina reads of the Amblyomma americanum transcriptome and may not have captured the diversity and complexity of longer transcripts. Here we apply long-read Pacific Bioscience technologies to complement the previously reported short-read Illumina transcriptome-based proteome in an effort to increase spectrum assignments. Our dataset reveals a small increase in assignable spectra to supplement previously released short-read transcriptome-based proteome.