Dataset Information

Mass spectrometry-based protein identification with accurate statistical significance assignment.

ABSTRACT:

Motivation

Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging.

Results

We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested.

Availability and implementation

The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit.

SUBMITTER: Alves G

PROVIDER: S-EPMC4341067 | biostudies-literature | 2015 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Mass spectrometry-based protein identification with accurate statistical significance assignment.

Alves Gelio G Yu Yi-Kuo YK

Bioinformatics (Oxford, England) 20141031 5

<h4>Motivation</h4>Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be a ...[more]

PMID: 25362092

Dataset Information

Mass spectrometry-based protein identification with accurate statistical significance assignment.

Motivation

Results

Availability and implementation

Publications

Mass spectrometry-based protein identification with accurate statistical significance assignment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance.
| S-EPMC4723618 | biostudies-literature

Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.
| S-EPMC6061032 | biostudies-literature

RAId: Knowledge-Integrated Proteomics Web Service with Accurate Statistical Significance Assignment.
| S-EPMC6635056 | biostudies-literature

Mass spectrometry-based detection and assignment of protein posttranslational modifications.
| S-EPMC4301092 | biostudies-other

Accurate mass spectrometry based protein quantification via shared peptides.
| S-EPMC3317402 | biostudies-literature

PhosSA: Fast and accurate phosphorylation site assignment algorithm for mass spectrometry data.
| S-EPMC3909108 | biostudies-literature

Detection and prevalence of monoclonal gammopathy of undetermined significance: a study utilizing mass spectrometry-based monoclonal immunoglobulin rapid accurate mass measurement.
| S-EPMC6910906 | biostudies-literature

Trap column-based intact mass spectrometry for rapid and accurate evaluation of protein molecular weight.
| S-EPMC9126647 | biostudies-literature

A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads.
| S-EPMC3462201 | biostudies-literature

Variability analysis of human plasma and cerebral spinal fluid reveals statistical significance of changes in mass spectrometry-based metabolomics data.
| S-EPMC3058611 | biostudies-literature