Project description:In order to test the global effects of CpG island-centered gene regulation on global gene expression profile, pA+ RNA-seq data of diverse tissues and cell lines were gathered and profiled. All available mouse poly-A positive RNA-seq data (3,818 samples) were summarized and downloaded at May, 5th, 2015. Among them, excluding single cell RNA-seq or experiments whose expression verified gene counts are small (less than 5,000 genes with RPKM 0.5 or higher), 1,524 high quality RNA-seq data were used. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).
Project description:We reanalyzed published RNA-seq data to study 1) the genomic landscapes near surrounding regions of transcriptional start sites with regard to the gene expression activities and 2) the gene expression change upon transcription factor (MYBL1, ATF4) depletion. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).
Project description:In order to elucidate the general rules for gene localization and regulation mediated by CpG islands, we reanalyzed published ChIP-seq data of CXXC domain, H3K9me3, KDM2A, SUV39H1, ATF4, MYBL1, MYOD1, SPI1, and CTCF. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using Bowtie 2.2.5 onto the mouse and human genome (mm9 and hg19, respectively). For the identification of factor binding sites, model-based analysis for ChIP-seq peak caller (MACS 1.4.2) was used with a p-value cutoff of 1e-5.
Project description:Purpose: To characterize transcriptional profiles of murine cytomegalovirus infected allografts after renal transplantation. Methods: RNA was isolated from murine allografts and native kidneys, with and without MCMV infection. Libraries were generated and paired end 150 base pair sequencing was performed on the HiSeq 4000 (Illumina) (Supplementary Methods). Each sample was aligned to the GRCm38.p4 assembly of the mouse reference from NCBI using version 2.6.0c of the RNA-Seq aligner STAR. Transcript features were identified from the GFF file provided with the assembly from NCBI and raw coverage counts were calculated using HTSeq. The raw RNA-Seq gene expression data was normalized and post-alignment statistical analyses were performed using DESeq2 and custom analysis scripts written in R. Comparisons of gene expression and associated statistical analysis were made between different conditions of interest using the normalized read counts. All fold change values are expressed as test condition/control condition, where values less than 1 are denoted as the negative of its inverse. Results: The QIAGEN Ingenuity Pathway Analysis (IPA) software was used for canonical pathway and differential gene expression analyses. IPA showed that, compared to MCMV infected native kidneys, transplantation of MCMV-infected kidneys led to significant changes in 5502 genes (adjusted p values <0.05), involved in 391 canonical pathways. The Th17 activation pathway showed 107 differentially expressed genes.Th1 pathway was one of the most highly upregulated pathways observed in the MCMV infected allografts. Conclusions: Transcripts for Th1/Th17 cell associated activation and signaling are differentially expressed in MCMV infected kidneys after allogeneic transplantation.
Project description:To investigate differentially expressed genes in HEK293T cells upon FKBP5 knockdown (n =2, reported in this study) and control scramble (n=2) (reported earlier Yadav et al., Cell Reports 2019, NCBI Genbank SRA accession no. PRJNA512165) using total RNA extracted from HEK293T cells and analysed by RNA-seq.
Project description:The sequence read archive (SRA) contains over 52 terabases or 482 billion reads from Drosophila melanogaster (as of June 2018). These data are massively underused by the community and include 14,423 RNA-Seq samples, that is roughly 7 times the size of modENCODE. Currently the major challenge is finding high quality datasets that are suitable for inclusion in new studies. To help the community overcome this hurdle, we re-processed all D. melanogaster RNA-Seq SRA experiments (SRXs) using an identical workflow. This workflow uses a data driven approach to identify technical metadata (i.e., strandedness and layout) for each sample in order to optimize mapping parameters. The workflow generates QC metrics, coverage tracks based on the dm6 assembly, and calculates gene level, junction level, and intergenic counts against FlyBase r6.11. This resource will allow any researcher to visualize browser tracks for any publicly available dataset, quickly identify high quality data sets for use in their own research, and download identically processed counts tables. There is a treasure trove of underused data sitting in the SRA and this work addresses the first challenge to make data integration a common laboratory practice.
Project description:DNA microarray and RNA-seq were performed on samples from four controls and eight SSc patients for testing the performance of intrinsic subset classification in two different gene expression profiling platforms. N0901, N0903, N1002, N1003, SSc0882, SSc0916, SSc0918, S0920 have previously been deposited on NCBI SRA at PRJNA237826. The remaining four SSc RNA-seq samples will be available at PRJNAXXXXXX.
Project description:<p>Variability in induced pluripotent stem cell (iPSC) lines remains a roadblock for disease modeling and regenerative medicine. Through linear mixed models we have described different sources of gene expression variability from RNA sequencing data in 317 human iPSC lines from 101 individuals. We found that ~50% of genome-wide expression variability is explained by variation across individuals and identified a set of expression quantitative trait loci that contribute to this variation. These analyses coupled with allele specific expression show that iPSCs retain a subject-specific gene expression pattern. Pathway enrichment and key driver analyses, based on predictive causal gene networks, found that Polycomb targets explain a significant part of the non-genetic variability present in iPSCs within and across individuals. These publically available iPSC lines and genetic datasets will be a resource to the scientific community and will open new avenues to reduce variability in iPSCs and improve their utility in disease modeling.</p> <p>SNP array data from individuals included in RNA-seq transcriptome profiling study of human induced pluripotent stem cells to characterize gene expression variation across individuals and within multiple iPSC lines from the same individual. Genotyping was performed on patient blood.</p> Data availability: <ul> <li>SNP-genotyping: dbGaP - current study</li> <li>RNA-seq counts: <a href="http://www.ncbi.nlm.nih.gov/geo/">GEO</a> - GSE79636</li> <li>FASTQ files: <a href="http://www.ncbi.nlm.nih.gov/sra">SRA</a> - SRP072417</li> </ul>
Project description:The aim of this sequencing experiment was to make available liver tissue expression for selected fish species, northern pike (Esox lucius, Eluc), coho salmon (Oncorhynchus kisutch, Okis) and Arctic charr (Salvelinus alpinus, Salp), for comparative expression studies between the species. Samples in replicate of four were sacrificed according to protocols at each of the facilities from where samples were obtained. RNA was extracted from samples and Illumina TruSeq Stranded mRNA libraries were built. Sequencing was performed in two passes on an Illumina HiSeq2500, paired-end 125bp reads. Processed count tables per species as raw counts, FPKM, or TPM, were generated from read alignment to the NCBI genomes of the respective species using STAR and gene level counting using RSEM and NCBI gene annotation.