Unknown

Dataset Information

0

Mapping annotations with textual evidence using an scLDA model.


ABSTRACT: Most of the knowledge regarding genes and proteins is stored in biomedical literature as free text. Extracting information from complex biomedical texts demands techniques capable of inferring biological concepts from local text regions and mapping them to controlled vocabularies. To this end, we present a sentence-based correspondence latent Dirichlet allocation (scLDA) model which, when trained with a corpus of PubMed documents with known GO annotations, performs the following tasks: 1) learning major biological concepts from the corpus, 2) inferring the biological concepts existing within text regions (sentences), and 3) identifying the text regions in a document that provides evidence for the observed annotations. When applied to new gene-related documents, a trained scLDA model is capable of predicting GO annotations and identifying text regions as textual evidence supporting the predicted annotations. This study uses GO annotation data as a testbed; the approach can be generalized to other annotated data, such as MeSH and MEDLINE documents.

SUBMITTER: Jin B 

PROVIDER: S-EPMC3243146 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

altmetric image

Publications

Mapping annotations with textual evidence using an scLDA model.

Jin Bo B   Chen Vicky V   Chen Lujia L   Lu Xinghua X  

AMIA ... Annual Symposium proceedings. AMIA Symposium 20111022


Most of the knowledge regarding genes and proteins is stored in biomedical literature as free text. Extracting information from complex biomedical texts demands techniques capable of inferring biological concepts from local text regions and mapping them to controlled vocabularies. To this end, we present a sentence-based correspondence latent Dirichlet allocation (scLDA) model which, when trained with a corpus of PubMed documents with known GO annotations, performs the following tasks: 1) learni  ...[more]

Similar Datasets

| S-EPMC7277719 | biostudies-literature
| S-EPMC6398864 | biostudies-literature
| S-EPMC8289374 | biostudies-literature
| S-EPMC4530550 | biostudies-literature
| S-EPMC5105870 | biostudies-literature
| S-EPMC2387223 | biostudies-literature
| S-EPMC9894484 | biostudies-literature
| S-EPMC8651148 | biostudies-literature
| S-EPMC1769513 | biostudies-literature
| S-EPMC8294940 | biostudies-literature