Unknown

Dataset Information

0

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.


ABSTRACT:

Objective

Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies.

Methods

We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset.

Results

Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics.

Conclusions

We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?

SUBMITTER: Krissaane I 

PROVIDER: S-EPMC7534581 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.

Krissaane Inès I   De Niz Carlos C   Gutiérrez-Sacristán Alba A   Korodi Gabor G   Ede Nneka N   Kumar Ranjay R   Lyons Jessica J   Manrai Arjun A   Patel Chirag C   Kohane Isaac I   Avillach Paul P  

Journal of the American Medical Informatics Association : JAMIA 20200901 9


<h4>Objective</h4>Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies.<h4>Methods</h4>We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis an  ...[more]

Similar Datasets

| S-EPMC5675877 | biostudies-literature
| S-EPMC8054753 | biostudies-literature
| S-EPMC7745556 | biostudies-literature
| S-EPMC2755485 | biostudies-literature
| S-EPMC4203657 | biostudies-other
| S-EPMC11299028 | biostudies-literature
| S-EPMC4986243 | biostudies-literature
| S-EPMC3145836 | biostudies-literature
| S-EPMC8862539 | biostudies-literature
| S-EPMC4906574 | biostudies-literature