ABSTRACT: BACKGROUND:In drug design, an efficient structure-based optimization of a ligand needs the precise knowledge of the protein-ligand interactions. In the absence of experimental information, docking programs are necessary for ligand positioning, and the choice of a reliable program is essential for the success of such an optimization. The performances of four popular docking programs, Gold, Glide, Surflex and FlexX, were investigated using 100 crystal structures of complexes taken from the Directory of Useful Decoys-Enhanced database. RESULTS:The ligand conformational sampling was rather efficient, with a correct pose found for a maximum of 84 complexes, obtained by Surflex. However, the ranking of the correct poses was not as efficient, with a maximum of 68 top-rank or 75 top-4 rank correct poses given by Glidescore. No relationship was found between either the sampling or the scoring performance of the four programs and the properties of either the targets or the small molecules, except for the number of ligand rotatable bonds. As well, no exploitable relationship was found between each program performance in docking and in virtual screening; a wrong top-rank pose may obtain a good score that allows it to be ranked among the most active compounds and vice versa. Also, to improve the results of docking, the strengths of the programs were combined either by using a rescoring procedure or the United Subset Consensus (USC). Oddly, positioning with Surflex and rescoring with Glidescore did not improve the results. However, USC based on docking allowed us to obtain a correct pose in the top-4 rank for 87 complexes. Finally, nine complexes were scrutinized, because a correct pose was found by at least one program but poorly ranked by all four programs. Contrarily to what was expected, except for one case, this was not due to weaknesses of the scoring functions. CONCLUSIONS:We conclude that the scoring functions should be improved to detect the correct poses, but sometimes their failure may be due to other varied considerations. To increase the chances of success, we recommend to use several programs and combine their results. Graphical abstract Summary of the results obtained by semi-rigid docking of crystallographic ligands. The docking was done on 100 protein-ligand X-ray structures, taken from the DUD-E database, and using four programs, Glide, Gold, Surflex and FlexX. Based on the docking results, we applied our United Subset Consensus method (USC), for which only the top4-rank poses are relevant. The number of complexes for which the best pose is correct, is represented by the gray boxes, the blue and red boxes correspond to the number of complexes with a correct pose ranked as the top 1 or within the top 4. A pose is considered correct when its root-mean-square deviation from the crystal structure is less than 2 Å.