Project description:We want to develop transcriptome assembly pipeline that significantly improves the quality of the assemblies constructed using stranded and/or unstranded RNA-seq data.
Project description:We want to develop transcriptome assembly pipeline that significantly improves the quality of the assemblies constructed using stranded and/or unstranded RNA-seq data. Transcriptome of mouse embryonic stem cells (mESC) were assembled using stranded and unstranded library generated by Illumina HiSeq 2000
Project description:We selected humann intervertebral disc samples to perform proteomics analysis. There were 1 case of grade I , 1 case of grade II, 3 cases of grade Ⅲ and 3 cases of grade Ⅳ according to Pfirrmann classfication. RNA seqencing analysis and single-cell RNA sequencing were integrated with proteomics data to identify the hub genes for intervertebral disc degeneration using bioinformatic method.
Project description:A single hematopoietic stem cell can give rise to all blood cells with remarkable fidelity. Here, we define the chromatin accessibility and transcriptional landscape controlling this process in thirteen primary cell types that traverse the hematopoietic hierarchy. Exploiting the finding that enhancer landscapes better reflect cell identity than mRNA levels, we enable "enhancer cytometry" for accurate enumeration of pure cell types from complex populations. We further reveal the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia, chromatin accessibility reveals distinctive regulatory evolution in pre-leukemic HSCs (pHSCs), leukemia stem cells, and leukemic blasts. These leukemic cells demonstrate unique lineage infidelity, confirmed by single cell regulomes. We further show that pHSCs have a competitive advantage that is conferred by reduced chromatin accessibility at HOXA9 targets and is associated with adverse patient outcomes. Thus, regulome dynamics can provide diverse insights into human hematopoietic development and disease. Transcription profiles of hematopoietic and leukemic cell types, assayed using unstranded RNA-seq, across 13 normal hematopoietic cell types and 3 acute myeloid leukemia cell types. The complete data set contains a total of 81 samples.
Project description:The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.