Unknown

Dataset Information

0

Identifying informative subsets of the Gene Ontology with information bottleneck methods.


ABSTRACT:

Motivation

The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable for measuring the semantic differences between terms, how to identify an informative subset that retains as much as possible of the original semantic information of GO.

Results

We represented the semantic context of a GO term using the word-usage-profile associated with the term, which enables one to measure the semantic differences between terms based on the differences in their semantic contexts. We further employed the information bottleneck methods to automatically identify subsets of GO terms that retain as much as possible of the semantic information in an annotation database. The automatically retrieved informative subsets align well with an expert-picked GO slim subset, cover important concepts and proteins, and enhance literature-based GO annotation.

Availability

http://carcweb.musc.edu/TextminingProjects/.

SUBMITTER: Jin B 

PROVIDER: S-EPMC2944202 | biostudies-literature | 2010 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identifying informative subsets of the Gene Ontology with information bottleneck methods.

Jin Bo B   Lu Xinghua X  

Bioinformatics (Oxford, England) 20100811 19


<h4>Motivation</h4>The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable for measuring the semantic differences between terms, how to identify an informative subset that  ...[more]

Similar Datasets

| S-EPMC7514526 | biostudies-literature
| S-EPMC5751813 | biostudies-literature
| S-EPMC4150992 | biostudies-literature
| S-EPMC1794235 | biostudies-other
| S-EPMC5905606 | biostudies-literature
| S-EPMC2224899 | biostudies-literature
| S-EPMC2447763 | biostudies-literature
| S-EPMC169004 | biostudies-literature