Unknown

Dataset Information

0

Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce.


ABSTRACT:

Motivation

Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data.

Results

We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.

Availability and implementation

Rail-RNA is available from http://rail.bio Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/

Contacts

: anellore@gmail.com or langmea@cs.jhu.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Nellore A 

PROVIDER: S-EPMC4978928 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4709609 | biostudies-literature
| S-EPMC2928508 | biostudies-literature
| S-EPMC5925781 | biostudies-literature
| S-EPMC5210596 | biostudies-literature
| S-EPMC6954005 | biostudies-literature
| S-EPMC4045712 | biostudies-literature
| S-EPMC4350034 | biostudies-literature
| S-EPMC6207866 | biostudies-literature
| S-EPMC9910747 | biostudies-literature
| S-EPMC6290780 | biostudies-literature