Ontology highlight
ABSTRACT: Summary
We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources.Availability and implementation
SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.Contact
lswang@pennmedicine.upenn.edu.Supplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Kuksa PP
PROVIDER: S-EPMC7320617 | biostudies-literature | 2020 Jun
REPOSITORIES: biostudies-literature
Kuksa Pavel P PP Lee Chien-Yueh CY Amlie-Wolf Alexandre A Gangadharan Prabhakaran P Mlynarski Elizabeth E EE Chou Yi-Fan YF Lin Han-Jen HJ Issen Heather H Greenfest-Allen Emily E Valladares Otto O Leung Yuk Yee YY Wang Li-San LS
Bioinformatics (Oxford, England) 20200601 12
<h4>Summary</h4>We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics wit ...[more]