Unknown

Dataset Information

0

Integrating information retrieval with distant supervision for gene ontology annotation.


ABSTRACT: This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on annotations in the training data supplemented with more noisy negatives from an external resource. Then, a greedy approach was applied to associate genes with sentences. For subtask B, we designed two types of systems: (i) search-based systems, which predict GO terms based on existing annotations for GOESs that are of different textual granularities (i.e., full-text articles, abstracts, and sentences) using state-of-the-art information retrieval techniques (i.e., a novel application of the idea of distant supervision) and (ii) a similarity-based system, which assigns GO terms based on the distance between words in sentences and GO terms/synonyms. Our best performing system for subtask A achieves an F1 score of 0.27 based on exact match and 0.387 allowing relaxed overlap match. Our best performing system for subtask B, a search-based system, achieves an F1 score of 0.075 based on exact match and 0.301 considering hierarchical matches. Our search-based systems for subtask B significantly outperformed the similarity-based system.https://github.com/noname2020/Bioc.

SUBMITTER: Zhu D 

PROVIDER: S-EPMC4150992 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrating information retrieval with distant supervision for gene ontology annotation.

Zhu Dongqing D   Li Dingcheng D   Carterette Ben B   Liu Hongfang H  

Database : the journal of biological databases and curation 20140901


This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on annotations in the training data supplemented with more noisy negatives from an external resource. Then, a greedy  ...[more]

Similar Datasets

| S-EPMC1869016 | biostudies-literature
| S-EPMC169004 | biostudies-literature
| S-EPMC2882681 | biostudies-literature
| S-EPMC2901810 | biostudies-literature
| S-EPMC9978587 | biostudies-literature
| S-EPMC5338769 | biostudies-literature
| S-EPMC308756 | biostudies-literature
| S-EPMC4397492 | biostudies-other
| S-EPMC4123725 | biostudies-literature
| S-EPMC7536087 | biostudies-literature