Project description:The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.
Project description:The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes. This SuperSeries is composed of the SubSeries listed below.
Project description:The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.
Project description:With anti-TET1 immunoprecipitated chromatin from mouse frontal cortices, we generated TET1 binding maps with 4496 high-confidence peaks and approximately 90% of TET1 peaks were within intergenic regions.
Project description:1. Evaluate the diagnostic value of long noncoding RNA (CCAT1) expression by RT-PCR in peripheral blood in colorectal cancer patients versus normal healthy control personal.
2. Evaluate the clinical utility of detecting long noncoding RNA (CCAT1) expression in diagnosis of colorectal cancer patients & its relation to tumor staging.
3. Evaluate the clinical utility of detecting long noncoding RNA (CCAT1) expression in precancerous colorectal diseases.
4. Compare long noncoding RNA (CCAT1) expression with traditional marker; carcinoembryonic antigen (CEA) and Carbohydrate antigen 19-9 (CA19-9) in diagnosis of colorectal cancer.
Project description:To understand the role of long non-coding RNAs and interaction with coding RNAs in bladder urothelial cell carcinoma (BUCC), we performed genome-wide screening long non-coding RNAs and coding RNAs expression on primary BUCC tissues and normal tissues using long non-coding RNA array (Agilent plateform (GPL13825). By comparing these two groups, significantly differentially expressed lncRNAs and coding RNAs were identified. We further identifed a subset of long noncoding RNAs and their correlation with neighboring coding genes using bioinformatic tools. This analysis provides foundamental understaning of transcriptomic landscape changing during bladder carcinogenesis. 12 BUCC primary tumors and 3 normal tissues were used for long noncoding RNA array experiments which including long non-coding RNAs and coding RNAs. The differential expression of subset of long noncoding RNAs and their interaction with coding RNAs in BUCC compared with normal tissue will be identified with comtational analysis.
Project description:Abiotic environmental stresses cause serious economic losses in agriculture. These stresses include temperature extremes, high salinity and drought. To isolate drought-responsive novel coding and noncoding genes, we used the next generation sequencing method from three rice cultivars (wild type nipponbare, nipponbare AP2 transgenic plants, wild type vandana). 36 NGS data of mRNA-seq, small RNA-seq, riboZero-seq were analyzed. For the analyses of these data we constructed a TF-TG (Transcription Factor-Target Gene) network and an ap2 rooted cascading tree. Using these networks and tress we isolated lincRNAs, differentially expressed miRNAs and their targets. We identified several drought stress-related novel/function unknown coding transcripts (transcription factors and functional genes) and non-coding transcripts (small noncoding transcripts such as microRNA and long noncoding transcripts) from these database analyses and have constructed databases of drought stress-related coding and noncoding transcripts Identification of drought-responsive Regulatory Coding and Non-coding Transcripts from rice by deep RNA sequencing