Unknown

Dataset Information

0

CSAR benchmark exercise of 2010: selection of the protein-ligand complexes.


ABSTRACT: A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) aims to collect available data from industry and academia which may be used for this purpose ( www.csardock.org ). Also, CSAR is charged with organizing community-wide exercises based on the collected data. The first of these exercises was aimed to gauge the overall state of docking and scoring, using a large and diverse data set of protein-ligand complexes. Participants were asked to calculate the affinity of the complexes as provided and then recalculate with changes which may improve their specific method. This first data set was selected from existing PDB entries which had binding data (K(d) or K(i)) in Binding MOAD, augmented with entries from PDB bind. The final data set contains 343 diverse protein-ligand complexes and spans 14 pK(d). Sixteen proteins have three or more complexes in the data set, from which a user could start an inspection of congeneric series. Inherent experimental error limits the possible correlation between scores and measured affinity; Pearson R is limited to ~ 0.91 (Pearson R2 0.83) when fitting to the data set without over parameterizing. Pearson R is limited to ~ 0.83(Pearson R2 ~ 0.70) when scoring the data set with a method trained on outside data [corrected]. The details of how the data set was initially selected, and the process by which it matured to better fit the needs of the community are presented. Many groups generously participated in improving the data set, and this underscores the value of a supportive, collaborative effort in moving our field forward.

SUBMITTER: Dunbar JB 

PROVIDER: S-EPMC3180202 | biostudies-literature | 2011 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

CSAR benchmark exercise of 2010: selection of the protein-ligand complexes.

Dunbar James B JB   Smith Richard D RD   Yang Chao-Yie CY   Ung Peter Man-Un PM   Lexa Katrina W KW   Khazanov Nickolay A NA   Stuckey Jeanne A JA   Wang Shaomeng S   Carlson Heather A HA  

Journal of chemical information and modeling 20110722 9


A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) aims to collect available data from industry and academia which may be used for this purpose ( www.csardock.org ). Also, CSAR is charged with organizing community-wide exercises based on the collected data. The first of these exercises was aimed to gauge the overall state of docking and scoring, using a large and diverse data set of protein-ligand comp  ...[more]

Similar Datasets

| S-EPMC3186041 | biostudies-literature
| S-EPMC6588165 | biostudies-literature
| S-EPMC3753884 | biostudies-literature
| S-EPMC3190652 | biostudies-literature
| S-EPMC2936463 | biostudies-literature
| S-EPMC3753885 | biostudies-other
| S-EPMC5005040 | biostudies-literature
| S-EPMC3726561 | biostudies-literature
| S-EPMC2442010 | biostudies-literature
| S-EPMC3779696 | biostudies-literature