Dataset Information

Mathematical model for empirically optimizing large scale production of soluble protein domains.

ABSTRACT:

Background

Efficient dissection of large proteins into their structural domains is critical for high throughput proteome analysis. So far, no study has focused on mathematically modeling a protein dissection protocol in terms of a production system. Here, we report a mathematical model for empirically optimizing the cost of large-scale domain production in proteomics research.

Results

The model computes the expected number of successfully producing soluble domains, using a conditional probability between domain and boundary identification. Typical values for the model's parameters were estimated using the experimental results for identifying soluble domains from the 2,032 Kazusa HUGE protein sequences. Among the 215 fragments corresponding to the 24 domains that were expressed correctly, 111, corresponding to 18 domains, were soluble. Our model indicates that, under the conditions used in our pilot experiment, the probability of correctly predicting the existence of a domain was 81% (175/215) and that of predicting its boundary was 63% (111/175). Under these conditions, the most cost/effort-effective production of soluble domains was to prepare one to seven fragments per predicted domain.

Conclusions

Our mathematical modeling of protein dissection protocols indicates that the optimum number of fragments tested per domain is actually much smaller than expected a priori. The application range of our model is not limited to protein dissection, and it can be utilized for designing various large-scale mutational analyses or screening libraries.

SUBMITTER: Chikayama E

PROVIDER: S-EPMC2843616 | biostudies-literature | 2010 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Mathematical model for empirically optimizing large scale production of soluble protein domains.

Chikayama Eisuke E Kurotani Atsushi A Tanaka Takanori T Yabuki Takashi T Miyazaki Satoshi S Yokoyama Shigeyuki S Kuroda Yutaka Y

BMC bioinformatics 20100301

<h4>Background</h4>Efficient dissection of large proteins into their structural domains is critical for high throughput proteome analysis. So far, no study has focused on mathematically modeling a protein dissection protocol in terms of a production system. Here, we report a mathematical model for empirically optimizing the cost of large-scale domain production in proteomics research.<h4>Results</h4>The model computes the expected number of successfully producing soluble domains, using a conditi ...[more]

PMID: 20193068

Similar Datasets

Project description:BackgroundModels of metabolism are often used in biotechnology and pharmaceutical research to identify drug targets or increase the direct production of valuable compounds. Due to the complexity of large metabolic systems, a number of conclusions have been drawn using mathematical methods with simplifying assumptions. For example, constraint-based models describe changes of internal concentrations that occur much quicker than alterations in cell physiology. Thus, metabolite concentrations and reaction fluxes are fixed to constant values. This greatly reduces the mathematical complexity, while providing a reasonably good description of the system in steady state. However, without a large number of constraints, many different flux sets can describe the optimal model and we obtain no information on how metabolite levels dynamically change. Thus, to accurately determine what is taking place within the cell, finer quality data and more detailed models need to be constructed.ResultsIn this paper we present a computational framework, DMPy, that uses a network scheme as input to automatically search for kinetic rates and produce a mathematical model that describes temporal changes of metabolite fluxes. The parameter search utilises several online databases to find measured reaction parameters. From this, we take advantage of previous modelling efforts, such as Parameter Balancing, to produce an initial mathematical model of a metabolic pathway. We analyse the effect of parameter uncertainty on model dynamics and test how recent flux-based model reduction techniques alter system properties. To our knowledge this is the first time such analysis has been performed on large models of metabolism. Our results highlight that good estimates of at least 80% of the reaction rates are required to accurately model metabolic systems. Furthermore, reducing the size of the model by grouping reactions together based on fluxes alters the resulting system dynamics.ConclusionThe presented pipeline automates the modelling process for large metabolic networks. From this, users can simulate their pathway of interest and obtain a better understanding of how altering conditions influences cellular dynamics. By testing the effects of different parameterisations we are also able to provide suggestions to help construct more accurate models of complete metabolic systems in the future.

Project description:BackgroundIn the past few years, both automated and manual high-throughput protein expression and purification has become an accessible means to rapidly screen and produce soluble proteins for structural and functional studies. However, many of the commercial vectors encoding different solubility tags require different cloning and purification steps for each vector, considerably slowing down expression screening. We have developed a set of E. coli expression vectors with different solubility tags that allow for parallel cloning from a single PCR product and can be purified using the same protocol.ResultsThe set of E. coli expression vectors, encode for either a hexa-histidine tag or the three most commonly used solubility tags (GST, MBP, NusA) and all with an N-terminal hexa-histidine sequence. The result is two-fold: the His-tag facilitates purification by immobilised metal affinity chromatography, whilst the fusion domains act primarily as solubility aids during expression, in addition to providing an optional purification step. We have also incorporated a TEV recognition sequence following the solubility tag domain, which allows for highly specific cleavage (using TEV protease) of the fusion protein to yield native protein. These vectors are also designed for ligation-independent cloning and they possess a high-level expressing T7 promoter, which is suitable for auto-induction. To validate our vector system, we have cloned four different genes and also one gene into all four vectors and used small-scale expression and purification techniques. We demonstrate that the vectors are capable of high levels of expression and that efficient screening of new proteins can be readily achieved at the laboratory level.ConclusionThe result is a set of four rationally designed vectors, which can be used for streamlined cloning, expression and purification of target proteins in the laboratory and have the potential for being adaptable to a high-throughput screening.

Dataset Information

Mathematical model for empirically optimizing large scale production of soluble protein domains.

Background

Results

Conclusions

Publications

Mathematical model for empirically optimizing large scale production of soluble protein domains.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets