Dataset Information

Literature mining of protein-residue associations with graph rules learned through distant supervision.

ABSTRACT: BACKGROUND: We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. RESULTS: The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. CONCLUSIONS: The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

SUBMITTER: Ravikumar K

PROVIDER: S-EPMC3465209 | biostudies-literature | 2012 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Literature mining of protein-residue associations with graph rules learned through distant supervision.

Ravikumar Ke K Liu Haibin H Cohn Judith D JD Wall Michael E ME Verspoor Karin K

Journal of biomedical semantics 20121005

<h4>Background</h4>We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patt ...[more]

PMID: 23046792

Dataset Information

Literature mining of protein-residue associations with graph rules learned through distant supervision.

Publications

Literature mining of protein-residue associations with graph rules learned through distant supervision.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Extracting microRNA-gene relations from biomedical literature using distant supervision.
| S-EPMC5338769 | biostudies-literature

CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision.
| S-EPMC6956794 | biostudies-literature

Mining a stroke knowledge graph from literature.
| S-EPMC8319697 | biostudies-literature

PEDL: extracting protein-protein associations using deep language models and distant supervision.
| S-EPMC7355289 | biostudies-literature

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods.
| S-EPMC5543364 | biostudies-literature

Distant Supervision for Extractive Question Summarization
| S-EPMC7148018 | biostudies-literature

Distant supervision for medical concept normalization.
| S-EPMC7415240 | biostudies-literature

Multivariate analysis of roadway multi-fatality crashes using association rules mining and rules graph structures: A case study in China.
| S-EPMC9612542 | biostudies-literature

SemaTyP: a knowledge graph based literature mining method for drug discovery.
| S-EPMC5975655 | biostudies-literature

The SNPcurator: literature mining of enriched SNP-disease associations.
| S-EPMC5844215 | biostudies-literature