Dataset Information

Improved standardization of transcribed digital specimen data.

ABSTRACT: There are more than 1.2 billion biological specimens in the world's museums and herbaria. These objects are particularly important forms of biological sample and observation. They underpin biological taxonomy but the data they contain have many other uses in the biological and environmental sciences. Nevertheless, from their conception they are almost entirely documented on paper, either as labels attached to the specimens or in catalogues linked with catalogue numbers. In order to make the best use of these data and to improve the findability of these specimens, these data must be transcribed digitally and made to conform to standards, so that these data are also interoperable and reusable. Through various digitization projects, the authors have experimented with transcription by volunteers, expert technicians, scientists, commercial transcription services and automated systems. We have also been consumers of specimen data for taxonomical, biogeographical and ecological research. In this paper, we draw from our experiences to make specific recommendations to improve transcription data. The paper is split into two sections. We first address issues related to database implementation with relevance to data transcription, namely versioning, annotation, unknown and incomplete data and issues related to language. We then focus on particular data types that are relevant to biological collection specimens, namely nomenclature, dates, geography, collector numbers and uniquely identifying people. We make recommendations to standards organizations, software developers, data scientists and transcribers to improve these data with the specific aim of improving interoperability between collection datasets.

SUBMITTER: Groom Q

PROVIDER: S-EPMC6901386 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improved standardization of transcribed digital specimen data.

Groom Quentin Q Dillen Mathias M Hardy Helen H Phillips Sarah S Willemse Luc L Wu Zhengzhe Z

Database : the journal of biological databases and curation 20190101

There are more than 1.2 billion biological specimens in the world's museums and herbaria. These objects are particularly important forms of biological sample and observation. They underpin biological taxonomy but the data they contain have many other uses in the biological and environmental sciences. Nevertheless, from their conception they are almost entirely documented on paper, either as labels attached to the specimens or in catalogues linked with catalogue numbers. In order to make the best ...[more]

PMID: 31819990

Dataset Information

Improved standardization of transcribed digital specimen data.

Publications

Improved standardization of transcribed digital specimen data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Improved spatial ecological sampling using open data and standardization: an example from malaria mosquito surveillance.
| S-EPMC6505554 | biostudies-literature

Methodology in specimen fabrication for in vitro dental studies: Standardization of extracted tooth preparation.
| S-EPMC5549588 | biostudies-other

Experimental Method for Tensile Testing of Unidirectional Carbon Fibre Composites Using Improved Specimen Type and Data Analysis.
| S-EPMC8303603 | biostudies-literature

Plant specimen contextual data consensus.
| S-EPMC5572840 | biostudies-literature

Data standardization of plant-pollinator interactions.
| S-EPMC9154084 | biostudies-literature

Using Delaunay triangulation to sample whole-specimen color from digital images.
| S-EPMC8462138 | biostudies-literature

Stanford DRO Toolkit: Digital Reference Objects for Standardization of Radiomic Features.
| S-EPMC7289253 | biostudies-literature

Standardization in Quantitative Imaging: A Multicenter Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Data Sets.
| S-EPMC7289262 | biostudies-literature

Collections Education: The Extended Specimen and Data Acumen.
| S-EPMC8824687 | biostudies-literature

Cardiac arrest risk standardization using administrative data compared to registry data.
| S-EPMC5544239 | biostudies-literature