Dataset Information

Cluster analysis of protein array results via similarity of Gene Ontology annotation.

ABSTRACT: BACKGROUND: With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein sets, they do not display integrated results in an easily-interpreted image or do not allow the user to specify the proteins to be analysed. RESULTS: We developed a novel computational approach to analyse the annotation of sets of molecules. As proof of principle, we analysed two sets of proteins identified in published protein array screens. The distance between any two proteins was measured as the graph similarity between their Gene Ontology (GO) annotations. These distances were then clustered to highlight subsets of proteins sharing related GO annotation. In the first set of proteins found to bind small molecule inhibitors of rapamycin, we identified three subsets containing four or five proteins each that may help to elucidate how rapamycin affects cell growth whereas the original authors chose only one novel protein from the array results for further study. In a set of phosphoinositide-binding proteins, we identified subsets of proteins associated with different intracellular structures that were not highlighted by the analysis performed in the original publication. CONCLUSION: By determining the distances between annotations, our methodology reveals trends and enrichment of proteins of particular functions within high-throughput datasets at a higher sensitivity than perusal of end-point annotations. In an era of increasingly complex datasets, such tools will help in the formulation of new, testable hypotheses from high-throughput experimental data.

SUBMITTER: Wolting C

PROVIDER: S-EPMC1539024 | biostudies-literature | 2006

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Cluster analysis of protein array results via similarity of Gene Ontology annotation.

Wolting Cheryl C McGlade C Jane CJ Tritchler David D

BMC bioinformatics 20060712

<h4>Background</h4>With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein sets, they do not display integrated results in an easily-interpreted image or do not allow the user to specify the proteins to be analysed.<h4>Results</h4>We developed a nove ...[more]

PMID: 16836750

Similar Datasets

Project description:BackgroundPulmonary atresia (PA) is a heterogeneous congenital heart defect and ventricular septal defect (VSD) is the most vital factor for the conventional classification of PA patients. The simple dichotomy could not fully describe the cardiac morphologies and pathophysiology in such a complex disease. We utilized the Human Phenotype Ontology (HPO) database to explore the phenotypic patterns of PA and the phenotypic influence on prognosis.MethodsWe recruited 786 patients with diagnoses of PA between 2008 and 2016 at Fuwai Hospital. According to cardiovascular phenotypes of patients, we retrieved 52 HPO terms for further analyses. The patients were classified into three clusters based on unsupervised hierarchical clustering. We used Kaplan-Meier curves to estimate survival, the log-rank test to compare survival between clusters, and univariate and multivariate Cox proportional hazards regression modeling to investigate potential risk factors.ResultsAccording to HPO term distribution, we observed significant differences of morphological abnormalities in 3 clusters. We defined cluster 1 as being associated with Tetralogy of Fallot (TOF), VSD, right ventricular hypertrophy (RVH), and aortopulmonary collateral arteries (ACA). ACA was not included in the cluster classification because it was not an HPO term. Cluster 2 was associated with hypoplastic right heart (HRH), atrial septal defect (ASD) and tricuspid disease as the main morphological abnormalities. Cluster 3 presented higher frequency of single ventricle (SV), dextrocardia, and common atrium (CA). The mortality rate in cluster 1 was significantly lower than the rates in cluster 2 and 3 (p = 0.04). Multivariable analysis revealed that abnormal atrioventricular connection (AAC, p = 0.011) and persistent left superior vena cava (LSVC, p = 0.003) were associated with an increased risk of mortality.ConclusionsOur study reported a large cohort with clinical phenotypic, surgical strategy and long time follow-up. In addition, we provided a precise classification and successfully risk stratification for patients with PA.

Project description:The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.

Dataset Information

Cluster analysis of protein array results via similarity of Gene Ontology annotation.

Publications

Cluster analysis of protein array results via similarity of Gene Ontology annotation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets