Unknown

Dataset Information

0

Predicting protein crystallization propensity from protein sequence.


ABSTRACT: The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein's propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for approximately 720 unique proteins that resulted in X-ray structures. The correlation of the protein's iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein's propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor .

SUBMITTER: Babnigg G 

PROVIDER: S-EPMC3366497 | biostudies-literature | 2010 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting protein crystallization propensity from protein sequence.

Babnigg György G   Joachimiak Andrzej A  

Journal of structural and functional genomics 20100223 1


The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein's propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for approximately 720 unique proteins that resulted  ...[more]

Similar Datasets

| S-EPMC3117383 | biostudies-literature
| S-EPMC4657326 | biostudies-literature
| S-EPMC5133789 | biostudies-literature
| S-EPMC6137969 | biostudies-literature
| S-EPMC3760885 | biostudies-literature
| S-EPMC4141844 | biostudies-literature
| S-EPMC2242438 | biostudies-literature
| S-EPMC6171492 | biostudies-literature
| S-EPMC7523644 | biostudies-literature
| S-EPMC5563152 | biostudies-literature