Dataset Information

Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation.

ABSTRACT: BACKGROUND: Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accurate genomic sequencing technology, public databases are being overwhelmed with new entries from different species every day. The application of these databases can also be problematic due to factors such as size, specificity, and unharmonized annotation of the molecules of interest. Current databases representing liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based searches focus on enzyme digestion patterns and sequence information and consequently, important functional information can be missed within the search output. Protein variants displaying similar sequence homology can interfere with database identification when only certain homologues are examined. In addition, recombinant DNA technology can result in products that may not be accurately annotated in public databases. Curated databases, which focus on the molecule of interest with clearer functional annotation and sequence information, are necessary for accurate protein identification and validation. Here, four cases of curated database application have been explored and summarized. FINDINGS: The four presented curated databases were constructed with clear goals regarding application and have proven very useful for targeted protein identification and biomarker application in different fields. They include a sheeppox virus database created for accurate identification of proteins with strong antigenicity, a custom database containing clearly annotated protein variants such as tau transcript variant 2 for accurate biomarker identification, a sheep-hamster chimeric prion protein (PrP) database constructed for assay development of prion diseases, and a custom Escherichia coli (E. coli) flagella (H antigen) database produced for MS-H, a new H-typing technique. Clearly annotating the proteins of interest was essential for highly accurate, specific, and sensitive sequence identification, and searching against public databases resulted in inaccurate identification of the sequence of interest, while combining the curated database with a public database reduced both the confidence and sequence coverage of the protein search. CONCLUSION: Curated protein sequence databases incorporating clear annotations are very useful for accurate protein identification and fit-for-purpose application through MS-based biomarker validation.

SUBMITTER: Cheng K

PROVIDER: S-EPMC4102332 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation.

Cheng Keding K Sloan Angela A McCorrister Stuart S Babiuk Shawn S Bowden Timothy R TR Wang Gehua G Knox J David JD

BMC research notes 20140710

<h4>Background</h4>Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accurate genomic sequencing technology, public databases are being overwhelmed with new entries from different species every day. The application of these databases can also be problematic due to fac ...[more]

PMID: 25011440

Similar Datasets

Project description:Adoption of targeted mass spectrometry (MS) approaches such as multiple reaction monitoring (MRM) to study biological and biomedical questions is well underway in the proteomics community. Successful application depends on the ability to generate reliable assays that uniquely and confidently identify target peptides in a sample. Unfortunately, there is a wide range of criteria being applied to say that an assay has been successfully developed. There is no consensus on what criteria are acceptable and little understanding of the impact of variable criteria on the quality of the results generated. Publications describing targeted MS assays for peptides frequently do not contain sufficient information for readers to establish confidence that the tests work as intended or to be able to apply the tests described in their own labs. Guidance must be developed so that targeted MS assays with established performance can be made widely distributed and applied by many labs worldwide. To begin to address the problems and their solutions, a workshop was held at the National Institutes of Health with representatives from the multiple communities developing and employing targeted MS assays. Participants discussed the analytical goals of their experiments and the experimental evidence needed to establish that the assays they develop work as intended and are achieving the required levels of performance. Using this "fit-for-purpose" approach, the group defined three tiers of assays distinguished by their performance and extent of analytical characterization. Computational and statistical tools useful for the analysis of targeted MS results were described. Participants also detailed the information that authors need to provide in their manuscripts to enable reviewers and readers to clearly understand what procedures were performed and to evaluate the reliability of the peptide or protein quantification measurements reported. This paper presents a summary of the meeting and recommendations.

Project description:BackgroundNocardiosis, despite its rarity and underreporting, is significant due to its severe impact, characterized by high morbidity and mortality rates. The development of a precise, reliable, rapid, and straightforward technique for identifying the pathogenic agent in clinical specimens is crucial to reduce fatality rates and facilitate timely antimicrobial treatment. In this study, we aimed to identify Nocardia spp. in clinical isolates, using MALDI-TOF MS as the primary method, with molecular methods as the gold standard. Clinical Nocardia isolates were identified using 16S rRNA/hsp65/gyrB/secA1/rpoB gene sequencing. Identification performance of the Bruker MALDI Biotyper 3.1 (V09.0.0.0_8468) and MBT Compass 4.1 (V11.0.0.0_10833) for Nocardia identification was evaluated.ResultsSeventy-six Nocardia isolates were classified into 12 species through gene sequencing. The MALDI Biotyper 3.1 (V09.0.0.0_8468) achieved 100% genus-level accuracy and 84.2% species accuracy (64/76). The MBT Compass 4.1 with the BDAL Database (V11.0.0.0_10833) improved species identification to 98.7% (75/76). The updated database enhanced species level identification with scores > 1.7, increasing from 77.6% (59/76) to 94.7% (72/76), a significant improvement (P = 0.001). The new and simplified extraction increased the proportion of strains identified to the species level with scores > 1.7 from 62.0% (18/29) to 86.2% (25/29) (P = 0.016). An in-house library construction ensured accurate species identification for all isolates.ConclusionsThe Bruker mass spectrometer can accurately identify Nocardia species, albeit with some variations observed between different database versions. The MALDI Biotyper 3.1 (V09.0.0.0_8468) has limitations in identifying Nocardia brasiliensis, with some strains only identifiable to the genus level. MBT Compass 4.1 (V11.0.0.0_10833) effectively addresses this shortfall, improving species identification accuracy to 98.7%, and offering quick and reliable identification of Nocardia. Both database versions incorrectly identified the clinically less common Nocardia sputorum as Nocardia araoensis. For laboratories that have not upgraded their databases and are unable to achieve satisfactory identification results for Nocardia, employing the new and simplified extraction method can provide a degree of improvement in identification outcomes.

Dataset Information

Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation.

Publications

Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets