Unknown

Dataset Information

0

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.


ABSTRACT:

Summary

We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources.

Availability and implementation

SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.

Contact

lswang@pennmedicine.upenn.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Kuksa PP 

PROVIDER: S-EPMC7320617 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

Kuksa Pavel P PP   Lee Chien-Yueh CY   Amlie-Wolf Alexandre A   Gangadharan Prabhakaran P   Mlynarski Elizabeth E EE   Chou Yi-Fan YF   Lin Han-Jen HJ   Issen Heather H   Greenfest-Allen Emily E   Valladares Otto O   Leung Yuk Yee YY   Wang Li-San LS  

Bioinformatics (Oxford, England) 20200601 12


<h4>Summary</h4>We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics wit  ...[more]

Similar Datasets

| S-EPMC6158604 | biostudies-literature
| S-EPMC7316086 | biostudies-literature
| S-EPMC8636496 | biostudies-literature
| S-EPMC4572001 | biostudies-literature
| S-EPMC5702242 | biostudies-literature
2023-10-16 | GSE225817 | GEO
2022-11-16 | GSE185941 | GEO
| S-EPMC3431185 | biostudies-literature