Unknown

Dataset Information

0

CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision.


ABSTRACT: MOTIVATION:Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence. RESULTS:We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications. AVAILABILITY AND IMPLEMENTATION:CoCoScore is available at: https://github.com/JungeAlexander/cocoscore. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Junge A 

PROVIDER: S-EPMC6956794 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision.

Junge Alexander A   Jensen Lars Juhl LJ  

Bioinformatics (Oxford, England) 20200101 1


<h4>Motivation</h4>Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence.<h4>Results</h4>We propose a novel corpus-wide co-occurrence scoring approach to rela  ...[more]

Similar Datasets

| S-EPMC9627715 | biostudies-literature
| S-EPMC3465209 | biostudies-literature
| S-EPMC7647812 | biostudies-literature
| S-EPMC7148018 | biostudies-literature
| S-EPMC7415240 | biostudies-literature
| S-EPMC5051953 | biostudies-literature
| S-EPMC6910696 | biostudies-literature
| S-EPMC5005450 | biostudies-literature
| PRJEB45529 | ENA
| S-EPMC5268788 | biostudies-literature