Ontology highlight
ABSTRACT: Background
Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear.Method
We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation.Results
By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.
SUBMITTER: Salari A
PROVIDER: S-EPMC7710495 | biostudies-literature | 2020 Dec
REPOSITORIES: biostudies-literature
Salari Ali A Kiar Gregory G Lewis Lindsay L Evans Alan C AC Glatard Tristan T
GigaScience 20201201 12
<h4>Background</h4>Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear.<h4>Method</h4>We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computat ...[more]