Unknown

Dataset Information

0

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB).


ABSTRACT: The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.

SUBMITTER: Shinada NK 

PROVIDER: S-EPMC7139665 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB).

Shinada Nicolas K NK   Schmidtke Peter P   de Brevern Alexandre G AG  

International journal of molecular sciences 20200324 6


The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Henc  ...[more]

Similar Datasets

| S-EPMC2840415 | biostudies-literature
| S-EPMC5823500 | biostudies-literature
| S-EPMC6276889 | biostudies-literature
| S-EPMC8166929 | biostudies-literature
| S-EPMC4619230 | biostudies-literature
| S-EPMC4233198 | biostudies-literature
| S-EPMC7371193 | biostudies-literature
| S-EPMC3081969 | biostudies-literature
| S-EPMC1173082 | biostudies-literature
| S-EPMC3992913 | biostudies-literature