Unknown

Dataset Information

0

EPA's DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research.


ABSTRACT: The US Environmental Protection Agency's (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA's Substance Registry Services (SRS), the National Library of Medicine's ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA's CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology.

SUBMITTER: Grulke CM 

PROVIDER: S-EPMC7787967 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6850623 | biostudies-literature
| S-EPMC5568366 | biostudies-literature
| S-EPMC8728139 | biostudies-literature
| S-EPMC2909940 | biostudies-literature
| S-EPMC6134592 | biostudies-literature
| S-EPMC3369752 | biostudies-other
| S-EPMC4865329 | biostudies-literature
| S-EPMC4872271 | biostudies-literature
| S-EPMC7313240 | biostudies-literature
| S-EPMC3118473 | biostudies-other