Unknown

Dataset Information

0

Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction.


ABSTRACT: MOTIVATION:Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. RESULTS:We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. AVAILABILITY:Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.

SUBMITTER: Handl J 

PROVIDER: S-EPMC2677743 | biostudies-literature | 2009 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction.

Handl Julia J   Knowles Joshua J   Lovell Simon C SC  

Bioinformatics (Oxford, England) 20090317 10


<h4>Motivation</h4>Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental  ...[more]

Similar Datasets

| S-EPMC3211142 | biostudies-literature
| S-EPMC7427878 | biostudies-literature
| S-EPMC449718 | biostudies-literature
2018-10-18 | PXD008463 | Pride
| S-EPMC4291136 | biostudies-literature