Dataset Information

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

ABSTRACT: Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

SUBMITTER: Kim B

PROVIDER: S-EPMC5569920 | biostudies-literature | 2017 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

Kim Baekdoo B Ali Thahmina T Lijeron Carlos C Afgan Enis E Krampis Konstantinos K

GigaScience 20170801 8

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abst ...[more]

PMID: 28854616

Dataset Information

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

Publications

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

The impact of Docker containers on the performance of genomic pipelines.
| S-EPMC4586803 | biostudies-literature

Tibanna: software for scalable execution of portable pipelines on the cloud.
| S-EPMC6931271 | biostudies-literature

Single-Step 3D Printing of Bio-Inspired Printable Joints Applied to a Prosthetic Hand.
| S-EPMC10599432 | biostudies-literature

Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform.
| S-EPMC9890740 | biostudies-literature

FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines.
| S-EPMC9481582 | biostudies-literature

Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses.
| S-EPMC4226638 | biostudies-literature

Primary cell wall inspired micro containers as a step towards a synthetic plant cell.
| S-EPMC7031234 | biostudies-literature

Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels.
| S-EPMC8080428 | biostudies-literature

miCloud: A Plug-n-Play, Extensible, On-Premises Bioinformatics Cloud for Seamless Execution of Complex Next-Generation Sequencing Data Analysis Pipelines.
| S-EPMC6441280 | biostudies-literature