Large-scale extraction of gene interactions from full-text literature using DeepDive.
Ontology highlight
ABSTRACT: MOTIVATION:A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles. METHODS:We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. RESULTS:Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles. AVAILABILITY AND IMPLEMENTATION:Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app CONTACT:russ.altman@stanford.edu SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
SUBMITTER: Mallory EK
PROVIDER: S-EPMC4681986 | biostudies-literature |
REPOSITORIES: biostudies-literature
ACCESS DATA