Cys.sqlite: A Structured-Information Approach to the Comprehensive Analysis of Cysteine Disulfide Bonds in the Protein Databank.
Ontology highlight
ABSTRACT: Cysteine is a multifaceted amino acid that is central to the structure and function of many proteins. A disulfide bond formed between two cysteines restrains protein conformations through the strong covalent bond and torsions about the bond that prefer, energetically, ±90°. In this study, we transform over 30?000 Protein Databank files (PDBx/mmCIFs) into a single file, the SQLite database (Cys.sqlite). The database schema is designed to accommodate the structural information on both oxidized and reduced cysteines and to retain essential protein metadata to establish informational and biological provenance. Cys.sqlite contains over 95?000 peptide chains and 500?000 cysteines (700?000 structural conformers); there are over 265?000 cysteine disulfide bond conformations from structures solved with all available experimental methods. The structural information is analyzed with respect to sequence identity cutoff, the experimental method, and energetics of the disulfide. We find that as the experimental information becomes limiting and the influence of modeling becomes more pronounced, the observed average strain increases artificially. The database and analyses presented here can be used to improve the refinement of biological structures from experiments that are known to contain one or more disulfide bonds.
SUBMITTER: Fobe TL
PROVIDER: S-EPMC6999612 | biostudies-literature | 2019 Feb
REPOSITORIES: biostudies-literature
ACCESS DATA