Dataset Information

Automating document classification for the Immune Epitope Database.

ABSTRACT:

Background

The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose.

Results

We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified.

Conclusion

By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers.

SUBMITTER: Wang P

PROVIDER: S-EPMC1965490 | biostudies-literature | 2007 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automating document classification for the Immune Epitope Database.

Wang Peng P Morgan Alexander A AA Zhang Qing Q Sette Alessandro A Peters Bjoern B

BMC bioinformatics 20070726

<h4>Background</h4>The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose.<h4>Results</h4>We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) ...[more]

PMID: 17655769

Dataset Information

Automating document classification for the Immune Epitope Database.

Background

Results

Conclusion

Publications

Automating document classification for the Immune Epitope Database.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Immune epitope database analysis resource (IEDB-AR).
| S-EPMC2447801 | biostudies-literature

Epitope Specific Antibodies and T Cell Receptors in the Immune Epitope Database.
| S-EPMC6255941 | biostudies-literature

IEDB-AR: immune epitope database-analysis resource in 2019.
| S-EPMC6602498 | biostudies-literature

IEDB-3D: structural data within the immune epitope database.
| S-EPMC3013771 | biostudies-literature

Better living through ontologies at the Immune Epitope Database.
| S-EPMC5467561 | biostudies-literature

The Immune Epitope Database and Analysis Resource in Epitope Discovery and Synthetic Vaccine Design.
| S-EPMC5348633 | biostudies-literature

Automatic document classification of biological literature.
| S-EPMC1559726 | biostudies-literature

Automating sleep stage classification using wireless, wearable sensors.
| S-EPMC6925191 | biostudies-literature

Automating document classification with distant supervision to increase the efficiency of systematic reviews: A case study on identifying studies with HIV impacts on female sex workers.
| S-EPMC9246134 | biostudies-literature

IEDB-3D 2.0: Structural data analysis within the Immune Epitope Database.
| S-EPMC10022491 | biostudies-literature