Unknown

Dataset Information

0

Predicting binding sites from unbound versus bound protein structures.


ABSTRACT: We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and one machine-learning-based methods: Surfnet, Ghecom, LIGSITEcsc, Fpocket, Depth, AutoSite, and Kalasanty. Distributions of the F scores and Matthew's correlation coefficients for ligand-bound versus ligand-free structure performance show no statistically significant difference in structure type versus performance for most methods. Only Fpocket showed a statistically significant but low magnitude enhancement in performance for holo structures. Lastly, we found that most methods will succeed on some crystal structures and fail on others within the same protein family, despite all structures being relatively high-quality structures with low structural variation. We expected better consistency across varying protein conformations of the same sequence. Interestingly, the success or failure of a given structure cannot be predicted by quality metrics such as resolution, Cruickshank Diffraction Precision index, or unresolved residues. Cryptic sites were also examined.

SUBMITTER: Clark JJ 

PROVIDER: S-EPMC7522209 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting binding sites from unbound versus bound protein structures.

Clark Jordan J JJ   Orban Zachary J ZJ   Carlson Heather A HA  

Scientific reports 20200928 1


We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and  ...[more]

Similar Datasets

| S-EPMC3259439 | biostudies-literature
| S-EPMC3270014 | biostudies-literature
| S-EPMC2896164 | biostudies-literature
| S-EPMC5860604 | biostudies-literature
| S-EPMC2761413 | biostudies-literature
| S-EPMC3045618 | biostudies-literature
| S-EPMC2808974 | biostudies-literature
| S-EPMC7214031 | biostudies-literature
| S-TOXR904 | biostudies-other
| S-EPMC7400645 | biostudies-literature