Unknown

Dataset Information

0

Large-scale extraction of gene interactions from full-text literature using DeepDive.


ABSTRACT: MOTIVATION:A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles. METHODS:We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. RESULTS:Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles. AVAILABILITY AND IMPLEMENTATION:Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app CONTACT:russ.altman@stanford.edu SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Mallory EK 

PROVIDER: S-EPMC4681986 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC3629104 | biostudies-literature
| S-EPMC7523651 | biostudies-literature
| S-EPMC4426844 | biostudies-literature
| S-EPMC3441580 | biostudies-literature
| S-EPMC5706669 | biostudies-literature
| S-EPMC8256824 | biostudies-literature
| S-EPMC4290788 | biostudies-literature
| S-EPMC4915133 | biostudies-literature
| S-EPMC10869356 | biostudies-literature
| S-EPMC3085580 | biostudies-literature