Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Remapping the SRA: Drosophila melanogaster RNA-Seq data from the Sequence Read Archive

ABSTRACT: The sequence read archive (SRA) contains over 52 terabases or 482 billion reads from Drosophila melanogaster (as of June 2018). These data are massively underused by the community and include 14,423 RNA-Seq samples, that is roughly 7 times the size of modENCODE. Currently the major challenge is finding high quality datasets that are suitable for inclusion in new studies. To help the community overcome this hurdle, we re-processed all D. melanogaster RNA-Seq SRA experiments (SRXs) using an identical workflow. This workflow uses a data driven approach to identify technical metadata (i.e., strandedness and layout) for each sample in order to optimize mapping parameters. The workflow generates QC metrics, coverage tracks based on the dm6 assembly, and calculates gene level, junction level, and intergenic counts against FlyBase r6.11. This resource will allow any researcher to visualize browser tracks for any publicly available dataset, quickly identify high quality data sets for use in their own research, and download identically processed counts tables. There is a treasure trove of underused data sitting in the SRA and this work addresses the first challenge to make data integration a common laboratory practice.

ORGANISM(S): Drosophila melanogaster

PROVIDER: GSE117217 | GEO | 2018/07/18

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Homo sapiens

Project description:NCBI-generated Human raw gene counts from SRA RNA-seq data

| PRJNA1023937 | ENA

Mus musculus

Project description:NCBI-generated Mouse raw gene counts from SRA RNA-seq data

| PRJNA1023938 | ENA

Track normalization and averaging of H3K4me3 ChIP-seq data across various cell and tissue types from Mouse ENCODE.

Project description:Data tracks from chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) experiments were processed using an in-house algorithm that provides normalization functionality followed by generation of a track average.

2015-10-09 | GSE73834 | GEO

Track normalization and averaging of bislufite-seq DNA methylation data across various cell and tissue types

2015-01-20 | GSE64577 | GEO

Track normalization and averaging of H3K4me3 ChIP-seq data across various cell and tissue types.

Project description:Data tracks from chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) experiments were sorted by tissue or cell types and processed using an in-house algorithm that provides normalization functionality followed by generation of a track average.

2015-01-20 | GSE65049 | GEO

CpG island mediated linear and spatial gene partitioning (pA+ RNA-seq profile)

Project description:In order to test the global effects of CpG island-centered gene regulation on global gene expression profile, pA+ RNA-seq data of diverse tissues and cell lines were gathered and profiled. All available mouse poly-A positive RNA-seq data (3,818 samples) were summarized and downloaded at May, 5th, 2015. Among them, excluding single cell RNA-seq or experiments whose expression verified gene counts are small (less than 5,000 genes with RPKM 0.5 or higher), 1,524 high quality RNA-seq data were used. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).

2018-02-15 | GSE80797 | GEO

Acute effects of active breaks during prolonged sitting on subcutaneous adipose tissue gene expression

Project description:Breaking up prolonged periods of time spent sitting has a range of beneficial impacts on cardiometabolic risk biomarkers. The molecular mechanisms include regulation of skeletal muscle gene and protein expression controlling metabolic, inflammatory and cell development pathways. An active communication network exists between adipose and muscle tissue, but the effect of active breaks in prolonged sitting on adipose tissue have yet to be investigated. This study characterised the acute transcriptional events induced in adipose tissue by regular active breaks during prolonged sitting. In a subset of 8 overweight/obese adults participating in an acute randomised three-intervention crossover trial, subcutaneous adipose tissue biopsies were obtained after each condition. The three experimental conditions were conducted in the postprandial state and included: i) prolonged uninterrupted sitting; or prolonged sitting interrupted with 2-minute bouts of ii) light- or iii) moderate-intensity treadmill walking every 20 minutes. Microarrays identified 36 differentially expressed genes between the three conditions (fold change≥0.5 in either direction; p<0.05). Pathway analysis indicated that breaking up of prolonged sitting led to differential regulation of adipose tissue metabolic networks and inflammatory pathways, increased insulin signalling, increased adipocyte turnover, and facilitated cross-talk between adipose tissue and other organs. This study provides insight into the adipose tissue regulatory systems and transcriptional processes that contribute to the physiological benefits of interrupting prolonged sitting.

2018-06-13 | GSE115645 | GEO

Functional and Structural Segregation of Overlapping Helices in HIV-1

Project description:We report deep mutational scanning data for the Env protein's LLP-2 domain in the NL4-3 strain HIV-1 Env. Processed Data repersents counts for each amino acid pre and post spread

2021-07-15 | GSE179046 | GEO

Track normalization and averaging of bislufite-seq DNA methylation data across various cell and tissue types

Project description:Data tracks from bisulfite sequencing (BS-seq) experiments were sorted by tissue or cell types and processed using an in-house algorithm that provides normalization functionality followed by generation of a track average. Re-analysis of Roadmap Epigenomics DNA methylation datasets using an in-house algorithm to create an average data track.

2015-01-20 | E-GEOD-64577 | biostudies-arrayexpress

A computational workflow for the analysis of 3’ Tag-Seq data

Project description:RNA-sequencing (RNA-seq) is a ubiquitous tool to profile genome-wide changes in gene expression. RNA-seq uses high-throughput sequencing technology to quantify the amount of RNA in a biological sample. With the increasing popularity of RNA-seq, many variations on the protocol have been proposed to extract unique and relevant information from biological samples. 3’ Tag-Seq (also called TagSeq, 3′ Tag-RNA-Seq, and Quant-Seq 3′ mRNA-Seq) is one RNA-seq variation, where the 3’ end of the transcript is selected and amplified to yield one copy of cDNA from each transcript in the biological sample.We present a simple, easy to use, and publicly available computational workflow to analyze 3’ Tag-Seq data. The workflow begins by trimming sequence adapters from raw FASTQ files. The trimmed sequence reads are checked for quality using FastQC, aligned to the reference genome, and read counts are obtained using STAR. Differential gene expression analysis is performed using DESeq2, based on differential analysis of gene count data. The outputs of this workflow are MA plots, tables of differentially expressed genes, and UpSet plots.This protocol is intended for users specifically interested in analyzing 3’ Tag-Seq data. As such, transcript length-based normalizations are not performed within the workflow. Future updates to this workflow could include custom analyses based on the gene counts table as well as data visualization enhancements.

2022-10-13 | GSE200778 | GEO

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data