Dataset Information

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

ABSTRACT:

Background

Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually.

Results

We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts.

Conclusions

We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.

SUBMITTER: Spasic I

PROVIDER: S-EPMC2367623 | biostudies-literature | 2008 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

Spasić Irena I Schober Daniel D Sansone Susanna-Assunta SA Rebholz-Schuhmann Dietrich D Kell Douglas B DB Paton Norman W NW

BMC bioinformatics 20080429

<h4>Background</h4>Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to ...[more]

PMID: 18460187

Dataset Information

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

Background

Results

Conclusions

Publications

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Text-mining approach to evaluate terms for ontology development.
| S-EPMC3153945 | biostudies-literature

Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies.
| S-EPMC3859262 | biostudies-literature

Preimplantation development regulatory pathway construction through a text-mining approach.
| S-EPMC3287586 | biostudies-literature

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.
| S-EPMC5051953 | biostudies-literature

Controlled vocabularies and semantics in systems biology.
| S-EPMC3261705 | biostudies-other

Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach.
| S-EPMC5706669 | biostudies-literature

Global Text Mining and Development of Pharmacogenomic Knowledge Resource for Precision Medicine.
| S-EPMC6692532 | biostudies-literature

Facilitating text reading in posterior cortical atrophy.
| S-EPMC4520813 | biostudies-literature

Text Mining in Organizational Research.
| S-EPMC5975701 | biostudies-literature

Getting started in text mining.
| S-EPMC2217579 | biostudies-literature