Unknown

Dataset Information

0

Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD).


ABSTRACT:

Background

The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage.

Results

Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking).

Conclusion

This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency.

SUBMITTER: Wiegers TC 

PROVIDER: S-EPMC2768719 | biostudies-literature | 2009 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD).

Wiegers Thomas C TC   Davis Allan Peter AP   Cohen K Bretonnel KB   Hirschman Lynette L   Mattingly Carolyn J CJ  

BMC bioinformatics 20091008


<h4>Background</h4>The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source com  ...[more]

Similar Datasets

| S-EPMC3629079 | biostudies-literature
| S-EPMC5047769 | biostudies-literature
| S-EPMC6891984 | biostudies-literature
| S-EPMC3842776 | biostudies-literature
| S-EPMC3515863 | biostudies-literature
| S-EPMC5130168 | biostudies-literature
| S-EPMC4457984 | biostudies-literature
| S-EPMC6917032 | biostudies-literature
| S-EPMC5502359 | biostudies-literature
| S-EPMC1560402 | biostudies-literature