Dataset Information

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

ABSTRACT: BACKGROUND:Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the idealde novopeptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible. RESULTS:Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible. CONCLUSIONS:We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016.

SUBMITTER: Tschager T

PROVIDER: S-EPMC5464308 | biostudies-other | 2017

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

Tschager Thomas T Rösch Simon S Gillet Ludovic L Widmayer Peter P

Algorithms for molecular biology : AMB 20170511

<h4>Background</h4>Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the <i>ideal</i><i>de novo</i><i>peptide sequencing problem</i>: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) <i>de novo peptide sequencing ...[more]

PMID: 28603547

Dataset Information

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

Publications

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

De novo design of symmetric ferredoxins that shuttle electrons in vivo.
| S-EPMC6642340 | biostudies-literature

Constrained de novo sequencing of conotoxins.
| S-EPMC3412931 | biostudies-literature

De novo peptide sequencing by deep learning.
| S-EPMC5547637 | biostudies-literature

Multiplex de novo sequencing of peptide antibiotics.
| S-EPMC3216106 | biostudies-literature

Building a better fragment library for de novo protein structure prediction.
| S-EPMC4406757 | biostudies-literature

A simplified scoring system in de novo follicular lymphoma treated initially with immunochemotherapy.
| S-EPMC6034646 | biostudies-literature

Prioritizing de novo autism risk variants with calibrated gene- and variant-scoring models.
| S-EPMC8938308 | biostudies-literature

Dereplication and de novo sequencing of nonribosomal peptides.
| S-EPMC2754211 | biostudies-literature

Novor: real-time peptide de novo sequencing software.
| S-EPMC4604512 | biostudies-literature

Automated de novo protein sequencing of monoclonal antibodies.
| S-EPMC2891972 | biostudies-literature