Dataset Information

Random sampling of the Protein Data Bank: RaSPDB.

ABSTRACT: A novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F-the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets-7000 protein chains-is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.

SUBMITTER: Carugo O

PROVIDER: S-EPMC8683422 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Random sampling of the Protein Data Bank: RaSPDB.

Carugo Oliviero O

Scientific reports 20211217 1

A novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F-the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically ...[more]

PMID: 34921198

Dataset Information

Random sampling of the Protein Data Bank: RaSPDB.

Publications

Random sampling of the Protein Data Bank: RaSPDB.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

The Membrane Protein Data Bank.
| S-EPMC2792347 | biostudies-literature

Enriched Conformational Sampling of DNA and Proteins with a Hybrid Hamiltonian Derived from the Protein Data Bank.
| S-EPMC6274895 | biostudies-literature

PDBe: Protein Data Bank in Europe.
| S-EPMC2808887 | biostudies-literature

PDBe: Protein Data Bank in Europe.
| S-EPMC3013808 | biostudies-literature

PDBe: Protein Data Bank in Europe.
| S-EPMC3965016 | biostudies-literature

PDBe: Protein Data Bank in Europe.
| S-EPMC3245096 | biostudies-literature

Trendspotting in the Protein Data Bank.
| S-EPMC4068610 | biostudies-literature

Analysis of crystallization data in the Protein Data Bank.
| S-EPMC4601584 | biostudies-literature

Announcing the launch of Protein Data Bank China as an Associate Member of the Worldwide Protein Data Bank Partnership.
| S-EPMC10478634 | biostudies-literature

The future of the Protein Data Bank.
| S-EPMC3684242 | biostudies-literature