Unknown

Dataset Information

0

Assessing and assuring interoperability of a genomics file format.


ABSTRACT:

Motivation

Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results.

Results

We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases-potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software's performance on the test suite.

Availability

Acidbio is available at https://github.com/hoffmangroup/acidbio.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Niu YN 

PROVIDER: S-EPMC9237710 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4874736 | biostudies-literature
| S-EPMC8896640 | biostudies-literature
| S-EPMC2655813 | biostudies-literature
| S-EPMC2945790 | biostudies-literature
| S-EPMC7265431 | biostudies-literature
| S-EPMC8522443 | biostudies-literature
| S-EPMC10069377 | biostudies-literature
| S-EPMC10492740 | biostudies-literature
| S-EPMC9980008 | biostudies-literature
| S-EPMC6018389 | biostudies-literature