Dataset Information

Effect of conformation sampling strategies in genetic algorithm for multiple protein docking.

ABSTRACT: BACKGROUND:Macromolecular protein complexes play important roles in a cell and their tertiary structure can help understand key biological processes of their functions. Multiple protein docking is a valuable computational tool for providing structure information of multimeric protein complexes. In a previous study we developed and implemented an algorithm for this purpose, named Multi-LZerD. This method represents a conformation of a multimeric protein complex as a graph, where nodes denote subunits and each edge connecting nodes denotes a pairwise docking conformation of the two subunits. Multi-LZerD employs a genetic algorithm to sample different topologies of the graph and pairwise transformations between subunits, seeking for the conformation of the optimal (lowest) energy. In this study we explore different configurations of the genetic algorithm, namely, the population size, whether to include a crossover operation, as well as the threshold for structural clustering, to find the optimal experimental setup. METHODS:Multi-LZerD was executed to predict the structures of three multimeric protein complexes, using different population sizes, clustering thresholds, and configurations of mutation and crossover. We analyzed the impact of varying these parameters on the computational time and the prediction accuracy. RESULTS AND CONCLUSIONS:Given that computational resources is a key for handling complexes with a large number of subunits and also for computing a large number of protein complexes in a genome-scale study, finding a proper setting for sampling the conformation space is of the utmost importance. Our results show that an excessive sampling of the conformational space by increasing the population size or by introducing the crossover operation is not necessary for improving accuracy for predicting structures of small complexes. The clustering is effective in reducing redundant pairwise predictions, which leads to successful identification of near-native conformations.

SUBMITTER: Esquivel-Rodriguez J

PROVIDER: S-EPMC3504801 | biostudies-literature | 2012 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Effect of conformation sampling strategies in genetic algorithm for multiple protein docking.

Esquivel-Rodríguez Juan J Kihara Daisuke D

BMC proceedings 20121113

<h4>Background</h4>Macromolecular protein complexes play important roles in a cell and their tertiary structure can help understand key biological processes of their functions. Multiple protein docking is a valuable computational tool for providing structure information of multimeric protein complexes. In a previous study we developed and implemented an algorithm for this purpose, named Multi-LZerD. This method represents a conformation of a multimeric protein complex as a graph, where nodes den ...[more]

PMID: 23173833

Similar Datasets

Project description:The present study investigates the use of algorithm selection for automatically choosing an algorithm for any given protein-ligand docking task. In drug discovery and design process, conceptualizing protein-ligand binding is a major problem. Targeting this problem through computational methods is beneficial in order to substantially reduce the resource and time requirements for the overall drug development process. One way of addressing protein-ligand docking is to model it as a search and optimization problem. There have been a variety of algorithmic solutions in this respect. However, there is no ultimate algorithm that can efficiently tackle this problem, both in terms of protein-ligand docking quality and speed. This argument motivates devising new algorithms, tailored to the particular protein-ligand docking scenarios. To this end, this paper reports a machine learning-based approach for improved and robust docking performance. The proposed set-up is fully automated, operating without any expert opinion or involvement both on the problem and algorithm aspects. As a case study, an empirical analysis was performed on a well-known protein, Human Angiotensin-Converting Enzyme (ACE), with 1428 ligands. For general applicability, AutoDock 4.2 was used as the docking platform. The candidate algorithms are also taken from AutoDock 4.2. Twenty-eight distinctly configured Lamarckian-Genetic Algorithm (LGA) are chosen to build an algorithm set. ALORS which is a recommender system-based algorithm selection system was preferred for automating the selection from those LGA variants on a per-instance basis. For realizing this selection automation, molecular descriptors and substructure fingerprints were employed as the features characterizing each target protein-ligand docking instance. The computational results revealed that algorithm selection outperforms all those candidate algorithms. Further assessment is reported on the algorithms space, discussing the contributions of LGA's parameters. As it pertains to protein-ligand docking, the contributions of the aforementioned features are examined, which shed light on the critical features affecting the docking performance.

Project description:BackgroundComputational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations.ResultsIn this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms.ConclusionsThe presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.

Dataset Information

Effect of conformation sampling strategies in genetic algorithm for multiple protein docking.

Publications

Effect of conformation sampling strategies in genetic algorithm for multiple protein docking.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets