Unknown

Dataset Information

0

Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services.


ABSTRACT: We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads.

SUBMITTER: Madduri RK 

PROVIDER: S-EPMC4203657 | biostudies-other | 2014 Sep

REPOSITORIES: biostudies-other

altmetric image

Publications

Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services.

Madduri Ravi K RK   Sulakhe Dinanath D   Lacinski Lukasz L   Liu Bo B   Rodriguez Alex A   Chard Kyle K   Dave Utpal J UJ   Foster Ian T IT  

Concurrency and computation : practice & experience 20140901 13


We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Am  ...[more]

Similar Datasets

| S-EPMC3321268 | biostudies-other
| S-EPMC2903720 | biostudies-literature
| S-EPMC8054753 | biostudies-literature
| S-EPMC6508881 | biostudies-literature
| S-EPMC3394276 | biostudies-literature
| S-EPMC5157836 | biostudies-literature
| S-EPMC4134471 | biostudies-literature
| S-EPMC3364462 | biostudies-literature
| S-EPMC5675877 | biostudies-literature
| S-EPMC7938325 | biostudies-literature