Dataset Information

GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets.

ABSTRACT:

Background

The increased accessibility of gene expression tools has enabled a wide variety of experiments utilizing transcriptomic analyses. As these tools increase in prevalence, the need for improved standardization in processing and presentation of data increases, as does the need to guard against interpretation bias. Gene Ontology (GO) analysis is a powerful method of interpreting and summarizing biological functions. However, while there are many tools available to investigate GO enrichment, there remains a need for methods that directly remove redundant terms from enriched GO lists that often provide little, if any, additional information.

Findings

Here we present a simple yet novel method called GO Trimming that utilizes an algorithm designed to reduce redundancy in lists of enriched GO categories. Depending on the needs of the user, this method can be performed with variable stringency. In the example presented here, an initial list of 90 terms was reduced to 54, eliminating 36 largely redundant terms. We also compare this method to existing methods and find that GO Trimming, while simple, performs well to eliminate redundant terms in a large dataset throughout the depth of the GO hierarchy.

Conclusions

The GO Trimming method provides an alternative to other procedures, some of which involve removing large numbers of terms prior to enrichment analysis. This method should free up the researcher from analyzing overly large, redundant lists, and instead enable the concise presentation of manageable, informative GO lists. The implementation of this tool is freely available at: http://lucy.ceh.uvic.ca/go_trimming/cbr_go_trimming.py.

SUBMITTER: Jantzen SG

PROVIDER: S-EPMC3160396 | biostudies-literature | 2011 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets.

Jantzen Stuart G SG Sutherland Ben Jg BJ Minkley David R DR Koop Ben F BF

BMC research notes 20110728

<h4>Background</h4>The increased accessibility of gene expression tools has enabled a wide variety of experiments utilizing transcriptomic analyses. As these tools increase in prevalence, the need for improved standardization in processing and presentation of data increases, as does the need to guard against interpretation bias. Gene Ontology (GO) analysis is a powerful method of interpreting and summarizing biological functions. However, while there are many tools available to investigate GO en ...[more]

PMID: 21798041

Similar Datasets

Project description:BACKGROUND: The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs) have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN). METHODS: In this study, we proposed a method to identify CRGs based on Gene Ontology (GO) and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene) from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs' GO and network characteristics. Lastly, we evaluated the performance of the proposed method. RESULTS: We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC) for our method is 65.2%, whereas that for the traditional method is 55.2%. CONCLUSIONS: Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable database for pharmacogenomics research.

Dataset Information

GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets.

Background

Findings

Conclusions

Publications

GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets