Dataset Information

GIBA: a clustering tool for detecting protein complexes.

ABSTRACT:

Background

During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental methodologies these datasets contain errors mainly in terms of false positive data sets and reducing therefore the quality of any derived information. Typically these datasets can be modeled as graphs, where vertices represent proteins and edges the pairwise PPIs, making it easy to apply automated clustering methods to detect protein complexes or other biological significant functional groupings.

Methods

In this paper, a clustering tool, called GIBA (named by the first characters of its developers' nicknames), is presented. GIBA implements a two step procedure to a given dataset of protein-protein interaction data. First, a clustering algorithm is applied to the interaction data, which is then followed by a filtering step to generate the final candidate list of predicted complexes.

Results

The efficiency of GIBA is demonstrated through the analysis of 6 different yeast protein interaction datasets in comparison to four other available algorithms. We compared the results of the different methods by applying five different performance measurement metrices. Moreover, the parameters of the methods that constitute the filter have been checked on how they affect the final results.

Conclusion

GIBA is an effective and easy to use tool for the detection of protein complexes out of experimentally measured protein - protein interaction networks. The results show that GIBA has superior prediction accuracy than previously published methods.

SUBMITTER: Moschopoulos CN

PROVIDER: S-EPMC2697634 | biostudies-literature | 2009 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

GIBA: a clustering tool for detecting protein complexes.

Moschopoulos Charalampos N CN Pavlopoulos Georgios A GA Schneider Reinhard R Likothanassis Spiridon D SD Kossida Sophia S

BMC bioinformatics 20090616

<h4>Background</h4>During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental methodologies these datasets contain errors mainly in terms of false positive data sets and reducing therefore the quality of any derived information. Typically these datasets can be modeled as graphs, where vertices represent proteins and edges the pairwise PPIs, making it easy to apply automat ...[more]

PMID: 19534736

Similar Datasets

Project description:BackgroundThe gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations.ResultsWe proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions.ConclusionSweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.

Dataset Information

GIBA: a clustering tool for detecting protein complexes.

Background

Methods

Results

Conclusion

Publications

GIBA: a clustering tool for detecting protein complexes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets