Unknown

Dataset Information

0

Bioinformatics pipeline using JUDI: Just Do It!


ABSTRACT:

Summary

Large-scale data analysis in bioinformatics requires pipelined execution of multiple software. Generally each stage in a pipeline takes considerable computing resources and several workflow management systems (WMS), e.g. Snakemake, Nextflow, Common Workflow Language, Galaxy, etc. have been developed to ensure optimum execution of the stages across two invocations of the pipeline. However, when the pipeline needs to be executed with different settings of parameters, e.g. thresholds, underlying algorithms, etc. these WMS require significant scripting to ensure an optimal execution. We developed JUDI on top of DoIt, a Python based WMS, to systematically handle parameter settings based on the principles of database management systems. Using a novel modular approach that encapsulates a parameter database in each task and file associated with a pipeline stage, JUDI simplifies plug-and-play of the pipeline stages. For a typical pipeline with n parameters, JUDI reduces the number of lines of scripting required by a factor of O(n). With properly designed parameter databases, JUDI not only enables reproducing research under published values of parameters but also facilitates exploring newer results under novel parameter settings.

Availability and implementation

https://github.com/ncbi/JUDI.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Pal S 

PROVIDER: S-EPMC7868055 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bioinformatics pipeline using JUDI: Just Do It!

Pal Soumitra S   Przytycka Teresa M TM  

Bioinformatics (Oxford, England) 20200401 8


<h4>Summary</h4>Large-scale data analysis in bioinformatics requires pipelined execution of multiple software. Generally each stage in a pipeline takes considerable computing resources and several workflow management systems (WMS), e.g. Snakemake, Nextflow, Common Workflow Language, Galaxy, etc. have been developed to ensure optimum execution of the stages across two invocations of the pipeline. However, when the pipeline needs to be executed with different settings of parameters, e.g. threshold  ...[more]

Similar Datasets

| S-EPMC2939613 | biostudies-literature
| S-EPMC1933140 | biostudies-literature
| S-EPMC7412107 | biostudies-literature
2023-08-08 | GSE203211 | GEO
| S-EPMC2268656 | biostudies-literature
| S-EPMC6821344 | biostudies-literature
| S-EPMC4525226 | biostudies-literature
| S-EPMC3570555 | biostudies-literature
| S-EPMC7197193 | biostudies-literature
| S-EPMC5862312 | biostudies-literature