Dataset Information

An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

ABSTRACT: RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at: http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and https://hub.docker.com/r/maayanlab/zika/.

SUBMITTER: Wang Z

PROVIDER: S-EPMC4972086 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

Wang Zichen Z Ma'ayan Avi A

F1000Research 20160705

RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configur ...[more]

PMID: 27583132

Similar Datasets

Project description:BackgroundDetection of exotic plant pathogens and preventing their entry and establishment are critical for the protection of agricultural systems while securing the global trading of agricultural commodities. High-throughput sequencing (HTS) has been applied successfully for plant pathogen discovery, leading to its current application in routine pathogen detection. However, the analysis of massive amounts of HTS data has become one of the major challenges for the use of HTS more broadly as a rapid diagnostics tool. Several bioinformatics pipelines have been developed to handle HTS data with a focus on plant virus and viroid detection. However, there is a need for an integrative tool that can simultaneously detect a wider range of other plant pathogens in HTS data, such as bacteria (including phytoplasmas), fungi, and oomycetes, and this tool should also be capable of generating a comprehensive report on the phytosanitary status of the diagnosed specimen.ResultsWe have developed an open-source bioinformatics pipeline called PhytoPipe (Phytosanitary Pipeline) to provide the plant pathology diagnostician community with a user-friendly tool that integrates analysis and visualization of HTS RNA-seq data. PhytoPipe includes quality control of reads, read classification, assembly-based annotation, and reference-based mapping. The final product of the analysis is a comprehensive report for easy interpretation of not only viruses and viroids but also bacteria (including phytoplasma), fungi, and oomycetes. PhytoPipe is implemented in Snakemake workflow with Python 3 and bash scripts in a Linux environment. The source code for PhytoPipe is freely available and distributed under a BSD-3 license.ConclusionsPhytoPipe provides an integrative bioinformatics pipeline that can be used for the analysis of HTS RNA-seq data. PhytoPipe is easily installed on a Linux or Mac system and can be conveniently used with a Docker image, which includes all dependent packages and software related to analyses. It is publicly available on GitHub at https://github.com/healthyPlant/PhytoPipe and on Docker Hub at https://hub.docker.com/r/healthyplant/phytopipe .

Dataset Information

An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

Publications

An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets