Dataset Information

Extraction of data deposition statements from the literature: a method for automatically tracking research results.

ABSTRACT: Research in the biomedical domain can have a major impact through open sharing of the data produced. For this reason, it is important to be able to identify instances of data production and deposition for potential re-use. Herein, we report on the automatic identification of data deposition statements in research articles.We apply machine learning algorithms to sentences extracted from full-text articles in PubMed Central in order to automatically determine whether a given article contains a data deposition statement, and retrieve the specific statements. With an Support Vector Machine classifier using conditional random field determined deposition features, articles containing deposition statements are correctly identified with 81% F-measure. An error analysis shows that almost half of the articles classified as containing a deposition statement by our method but not by the gold standard do indeed contain a deposition statement. In addition, our system was used to process articles in PubMed Central, predicting that a total of 52 932 articles report data deposition, many of which are not currently included in the Secondary Source Identifier [si] field for MEDLINE citations.All annotated datasets described in this study are freely available from the NLM/NCBI website at http://www.ncbi.nlm.nih.gov/CBBresearch/Fellows/Neveol/DepositionDataSets.zipaurelie.neveol@nih.gov; john.wilbur@nih.gov; zhiyong.lu@nih.govSupplementary data are available at Bioinformatics online.

SUBMITTER: Neveol A

PROVIDER: S-EPMC3223368 | biostudies-literature | 2011 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Extraction of data deposition statements from the literature: a method for automatically tracking research results.

Névéol Aurélie A Wilbur W John WJ Lu Zhiyong Z

Bioinformatics (Oxford, England) 20111013 23

<h4>Motivation</h4>Research in the biomedical domain can have a major impact through open sharing of the data produced. For this reason, it is important to be able to identify instances of data production and deposition for potential re-use. Herein, we report on the automatic identification of data deposition statements in research articles.<h4>Results</h4>We apply machine learning algorithms to sentences extracted from full-text articles in PubMed Central in order to automatically determine whe ...[more]

PMID: 21998156

Similar Datasets

Project description:IntroductionValue and waste in preclinical and clinical research projects are intensively debated in biomedicine at present. Such different aspects as the need for setting objectives and priorities, improving study design, quality of reporting, and problematic incentives of the academic reward system are addressed. While this debate is also fueled by ethical considerations and thus informed by bioethical research, up to now, the field of bioethics lacks a similar extensive debate. Nonetheless, bioethical research should not go unquestioned regarding its scientific or social value. What exactly constitutes the value of bioethical research, however, remains widely unclear so far.MethodsThis explorative study investigated possible value dimensions for bioethical research by conducting a qualitative literature analysis of researchers' statements about the value of their studies. 40 bioethics articles published 2015 in four relevant journals (The American Journal of Bioethics, Bioethics, BMC Medical Ethics and Journal of Medical Ethics) were analyzed. The value dimensions of "advancing knowledge" (e.g. research results that are relevant for science itself and for further research) and "application" (e.g. increasing applicability of research results in practice) were used as main deductive categories for the analysis. Further subcategories were inductively generated.ResultsThe analysis resulted in 62 subcategories representing a wide range of value dimensions for bioethical research. Of these, 45 were subcategories of "advancing knowledge" and 17 of "application". In 21 articles, no value dimensions related to "application" was found; the remaining 19 articles mentioned "advancing knowledge" as well as "application". The value dimensions related to "advancing knowledge" were, in general, more fine-grained.ConclusionsEven though limitations arise regarding the sample, the study revealed a plethora of value dimensions that can inform further debates about what makes bioethical research valuable for science and society. Besides theoretical reflections on the value of bioethics more meta-research in bioethics is needed.

Dataset Information

Extraction of data deposition statements from the literature: a method for automatically tracking research results.

Publications

Extraction of data deposition statements from the literature: a method for automatically tracking research results.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets