Unknown

Dataset Information

0

Experimenting with reproducibility: a case study of robustness in bioinformatics.


ABSTRACT: Reproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our difficulties in reproducing a published bioinformatics method even though code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the whole method in a Python package to avoid dependency on a MATLAB license and ease the execution of the code on a high-performance computing cluster. Third, we assessed reusability of our reimplementation and the quality of our documentation, testing how easy it would be to start from our implementation to reproduce the results. In a second section, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective levels.While finalizing our code, we created case-specific documentation and tutorials for the associated Python package StratiPy. Readers are invited to experiment with our reproducibility case study by generating the two confusion matrices (see more in section "Robustness: from MATLAB to Python, language and organization"). Here, we propose two options: a step-by-step process to follow in a Jupyter/IPython notebook or a Docker container ready to be built and run.

SUBMITTER: Kim YM 

PROVIDER: S-EPMC6054242 | biostudies-literature | 2018 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Experimenting with reproducibility: a case study of robustness in bioinformatics.

Kim Yang-Min YM   Poline Jean-Baptiste JB   Dumas Guillaume G  

GigaScience 20180701 7


Reproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our difficulties in reproducing a published bioinformatics method even though code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the whole method in a Python packa  ...[more]

Similar Datasets

| S-EPMC5653292 | biostudies-literature
| S-EPMC4372097 | biostudies-literature
| S-EPMC4264875 | biostudies-literature
| S-EPMC4252088 | biostudies-literature
| S-EPMC5989067 | biostudies-literature
| S-EPMC8566820 | biostudies-literature
| S-EPMC3005918 | biostudies-literature
2018-10-31 | GSE89044 | GEO
| S-EPMC9269905 | biostudies-literature
2015-01-30 | GSE63378 | GEO