Project description:In order to elucidate the general rules for gene localization and regulation mediated by CpG islands, we reanalyzed published ChIP-seq data of CXXC domain, H3K9me3, KDM2A, SUV39H1, ATF4, MYBL1, MYOD1, SPI1, and CTCF. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using Bowtie 2.2.5 onto the mouse and human genome (mm9 and hg19, respectively). For the identification of factor binding sites, model-based analysis for ChIP-seq peak caller (MACS 1.4.2) was used with a p-value cutoff of 1e-5.
Project description:We reanalyzed published RNA-seq data to study 1) the genomic landscapes near surrounding regions of transcriptional start sites with regard to the gene expression activities and 2) the gene expression change upon transcription factor (MYBL1, ATF4) depletion. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).
Project description:In order to test the global effects of CpG island-centered gene regulation on global gene expression profile, pA+ RNA-seq data of diverse tissues and cell lines were gathered and profiled. All available mouse poly-A positive RNA-seq data (3,818 samples) were summarized and downloaded at May, 5th, 2015. Among them, excluding single cell RNA-seq or experiments whose expression verified gene counts are small (less than 5,000 genes with RPKM 0.5 or higher), 1,524 high quality RNA-seq data were used. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).
Project description:This SuperSeries is composed of the following subset Series: GSE29043: MicroRNAs and their isomiRs function cooperatively to target common biological pathways (Illumina expression beadchip) GSE29100: MicroRNAs and their isomiRs function cooperatively to target common biological pathways (Agilent miRNA array) Refer to individual Series This represents the miRNA placenta samples and mRNA + miRNA pulldown only The miRNA-Seq data have been submitted to the short read archive under SRA number SRP006043: http://www.ncbi.nlm.nih.gov/sra?term= SRP006043
Project description:The sequence read archive (SRA) contains over 52 terabases or 482 billion reads from Drosophila melanogaster (as of June 2018). These data are massively underused by the community and include 14,423 RNA-Seq samples, that is roughly 7 times the size of modENCODE. Currently the major challenge is finding high quality datasets that are suitable for inclusion in new studies. To help the community overcome this hurdle, we re-processed all D. melanogaster RNA-Seq SRA experiments (SRXs) using an identical workflow. This workflow uses a data driven approach to identify technical metadata (i.e., strandedness and layout) for each sample in order to optimize mapping parameters. The workflow generates QC metrics, coverage tracks based on the dm6 assembly, and calculates gene level, junction level, and intergenic counts against FlyBase r6.11. This resource will allow any researcher to visualize browser tracks for any publicly available dataset, quickly identify high quality data sets for use in their own research, and download identically processed counts tables. There is a treasure trove of underused data sitting in the SRA and this work addresses the first challenge to make data integration a common laboratory practice.
Project description:This experiment captures the expression data reported by the RIKEN FANTOM5 project ( http://fantom.gsc.riken.jp/5/ ), focusing on mice tissue data which was deposited in the sequence read archive (SRA) under study accession DRP001032 (https://www.ebi.ac.uk/ena/data/view/DRP001031 ) . The samples in this experiment can also be found on a dedicated page of the FANTOM website: http://fantom.gsc.riken.jp/5/sstar/Browse_samples. Since this is CAGE analysis, gene expression data is reported by FANTOM5 in TPMs (tags per milliion) for gene promoters. This is in conjunction with E-MTAB-3578 (http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3578/)
Project description:We report the results of chromatin immunoprecipitation following by high-thoughput tag sequencing (ChIP-Seq) using the GA II platform from Illumina for the human transcription factor STAT1 in HeLa S3 cells. The STAT1 ChIP was performed using HeLa S3 cells that are stimulated using gamma-interferon. We have also generated a seqenced input DNA dataset for gamma-interferon stimulated HeLa S3 cells. Raw data for this study is available for download from the Short Read Archive database at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP000703. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf