Unknown

Dataset Information

0

Escape Excel: A tool for preventing gene symbol and accession conversion errors.


ABSTRACT:

Background

Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.

Results

Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/).

Conclusions

Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.

SUBMITTER: Welsh EA 

PROVIDER: S-EPMC5617173 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

Escape Excel: A tool for preventing gene symbol and accession conversion errors.

Welsh Eric A EA   Stewart Paul A PA   Kuenzi Brent M BM   Eschrich James A JA  

PloS one 20170927 9


<h4>Background</h4>Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.<h4>Results</h4>Here, we present an open-source tool, Escape Excel, which prev  ...[more]

Similar Datasets

| S-EPMC2771038 | biostudies-literature
| S-EPMC7959593 | biostudies-literature
| S-EPMC459209 | biostudies-literature
| S-EPMC2561161 | biostudies-literature
| S-EPMC9284118 | biostudies-literature
| S-EPMC4173023 | biostudies-literature
| S-EPMC10589215 | biostudies-literature
| S-EPMC5068242 | biostudies-literature
| S-EPMC7817669 | biostudies-literature
| S-EPMC5682037 | biostudies-literature