Dataset Information

Learning from decoys to improve the sensitivity and specificity of proteomics database search results.

ABSTRACT: The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies--separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.

SUBMITTER: Yadav AK

PROVIDER: S-EPMC3506577 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Learning from decoys to improve the sensitivity and specificity of proteomics database search results.

Yadav Amit Kumar AK Kumar Dhirendra D Dash Debasis D

PloS one 20121126 11

The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Dis ...[more]

PMID: 23189209

Similar Datasets

Project description:Training can modify the visual system to produce a substantial improvement on perceptual tasks and therefore has applications for treating visual deficits. Visual perceptual learning (VPL) is often specific to the trained feature, which gives insight into processes underlying brain plasticity, but limits VPL's effectiveness in rehabilitation. Under what circumstances VPL transfers to untrained stimuli is poorly understood. Here we report a qualitatively new phenomenon: intrinsic variation in the representation of features determines the transfer of VPL. Orientations around cardinal are represented more reliably than orientations around oblique in V1, which has been linked to behavioral consequences such as visual search asymmetries. We studied VPL for visual search of near-cardinal or oblique targets among distractors of the other orientation while controlling for other display and task attributes, including task precision, task difficulty, and stimulus exposure. Learning was the same in all training conditions; however, transfer depended on the orientation of the target, with full transfer of learning from near-cardinal to oblique targets but not the reverse. To evaluate the idea that representational reliability was the key difference between the orientations in determining VPL transfer, we created a model that combined orientation-dependent reliability, improvement of reliability with learning, and an optimal search strategy. Modeling suggested that not only search asymmetries but also the asymmetric transfer of VPL depended on preexisting differences between the reliability of near-cardinal and oblique representations. Transfer asymmetries in model behavior also depended on having different learning rates for targets and distractors, such that greater learning for low-reliability distractors facilitated transfer. These findings suggest that training on sensory features with intrinsically low reliability may maximize the generalizability of learning in complex visual environments.

Dataset Information

Learning from decoys to improve the sensitivity and specificity of proteomics database search results.

Publications

Learning from decoys to improve the sensitivity and specificity of proteomics database search results.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets