Dataset Information

Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my!

ABSTRACT: Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions.

SUBMITTER: Brown CA

PROVIDER: S-EPMC2879359 | biostudies-literature | 2010

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my!

Brown Christopher A CA Brown Kevin S KS

PloS one 20100601 6

Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose fro ...[more]

PMID: 20531955

Similar Datasets

Project description:BackgroundWhile there has been much discussion by policymakers and stakeholders about the effects of "secondary patents" on the pharmaceutical industry, there is no empirical evidence on their prevalence or determinants. Characterizing the landscape of secondary patents is important in light of recent court decisions in the U.S. that may make them more difficult to obtain, and for developing countries considering restrictions on secondary patents.Methodology/principal findingsWe read the claims of the 1304 Orange Book listed patents on all new molecular entities approved in the U.S. between 1988 and 2005, and coded the patents as including chemical compound claims (claims covering the active molecule itself) and/or one of several types of secondary claims. We distinguish between patents with any secondary claims, and those with only secondary claims and no chemical compound claims ("independent" secondary patents). We find that secondary claims are common in the pharmaceutical industry. We also show that independent secondary patents tend to be filed and issued later than chemical compound patents, and are also more likely to be filed after the drug is approved. When present, independent formulation patents add an average of 6.5 years of patent life (95% C.I.: 5.9 to 7.3 years), independent method of use patents add 7.4 years (95% C.I.: 6.4 to 8.4 years), and independent patents on polymorphs, isomers, prodrug, ester, and/or salt claims add 6.3 years (95% C.I.: 5.3 to 7.3 years). We also provide evidence that late-filed independent secondary patents are more common for higher sales drugs.Conclusions/significancePolicies and court decisions affecting secondary patenting are likely to have a significant impact on the pharmaceutical industry. Secondary patents provide substantial additional patent life in the pharmaceutical industry, at least nominally. Evidence that they are also more common for best-selling drugs is consistent with accounts of active "life cycle management" or "evergreening" of patent portfolios in the industry.

Dataset Information

Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my!

Publications

Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my!

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets