Project description:Usually, unmapped reads have been considered as useless and been trashed or ignored. Here, we develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data(GSM3188619) and randomly select one 149 bp read as a model(CTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCA ). Specific reverse transcription primers(5' end:CTGGTGCCATAATTCAGGGA, 3' end:GGATCTTCACGTAACGGATTGT) are designed to amplify its both ends, followed by next generation sequencing. Then we use a statistical model base on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1,556 bp, with InDel mutation in microsatellite structure. This would be a useful strategy to extract the sequences information from the unmapped RNA-seq data.
Project description:BackgroundAs a powerful tool, RNA-Seq has been widely used in various studies. Usually, unmapped RNA-seq reads have been considered as useless and been trashed or ignored.ResultsWe develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data and randomly select one 149 bp read as a model. Specific reverse transcription primers are designed to amplify its both ends, followed by next generation sequencing. Then we design a statistical model based on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1556 bp, with insertion mutations in microsatellite structure.ConclusionWe believe this method would be a useful strategy to extract the sequences information from the unmapped RNA-seq data. Further, it is an alternative way to get the full length sequence of unknown cDNA.
Project description:Circular RNAs (circRNAs) have been found abundantly expressed in cancer. Their resistance to exonucleases enables them to have potentially stable interactions with with different types of biomolecules. Alternative splicing can create different circRNA isoforms that have different sequences and unequal interaction potentials. The study of circRNA function thus requires knowledge of complete circRNA sequences. Here we describe psirc, a method that can identify full-length circRNA isoforms and quantify their expression levels using RNA sequencing data. We confirm the effectiveness and computational efficiency of psirc using both simulated and actual experimental data. Applying psirc on transcriptome profiles from nasopharyngeal carcinoma and normal nasopharynx samples, we discovered circRNA isoforms differentially expressed between the two groups. Compared to the assumed circular isoforms derived from linear transcript annotations, some of the alternatively spliced circular isoforms have 100 times higher expression and contain fewer microRNA response elements, demonstrating the importance of quantifying full-length circRNA isoforms.
Project description:We reported an atlas of de novo-defined, full-length macaque gene models on the basis of single molecule long-read transcriptome sequencing (Iso-seq).