Project description:One direct route for the discovery of therapeutic human monoclonal antibodies (mAbs) involves the isolation of peripheral B cells from survivors/sero-positive individuals after exposure to an infectious reagent or disease etiology, followed by single-cell sequencing or hybridoma generation. Peripheral B cells, however, are not always easy to obtain and represent only a small percentage of the total B-cell population across all bodily tissues. Although it has been demonstrated that tandem mass spectrometry (MS/MS) techniques can interrogate the full polyclonal antibody (pAb) response to an antigen in vivo, all current approaches identify MS/MS spectra against databases derived from genetic sequencing of B cells from the same patient. In this proof-of-concept study, we demonstrate the feasibility of a novel MS/MS antibody discovery approach in which only serum antibodies are required without the need for sequencing of genetic material. Peripheral pAbs from a cytomegalovirus-exposed individual were purified by glycoprotein B antigen affinity and de novo sequenced from MS/MS data. Purely MS-derived mAbs were then manufactured in mammalian cells to validate potency via antigen-binding ELISA. Interestingly, we found that these mAbs accounted for 1 to 2% of total donor IgG but were not detected in parallel sequencing of memory B cells from the same patient.
Project description:Our study is a large-scale evaluation of de novo peptide sequencing tools using both publicly and proprietary heterogeneous datasets spanning multiple species, mass spectrometers, post-translational modifications, and a wide range of use cases. This dataset contains a publicly available subset of this dataset.
The dataset includes raw MS/MS spectra in their original vendor formats as well as in mzML and Mascot Generic Format (MGF). Additionally, peptide spectral hits (PSMs) identified through a database search and rescoring workflow, along with the dataset-specific protein databases used for annotation, are provided.
Project description:Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systematically varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.
Project description:Methylation of Hepatitis B Virus (HBV) DNA in a CpG context (5mCpG) can alter the expression patterns of viral genes related to infection and cellular transformation. Moreover, it may also provide clues to why certain infections are cleared, or persist with or without progression to cancer. The detection of 5mCpG often requires techniques that damage DNA or introduce bias through a myriad of limitations. Therefore, we developed a method for the detection of 5mCpG on the HBV genome that does not rely on bisulfite conversion or PCR. Moreover, using the developed technique, we have provided the first de novo assembly of native HBV DNA, as well as the first landscape of 5mCpG from native HBV sequences
Project description:Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Project description:In this work, we explore the potential of the metalloendopeptidase Lys-N for MALDI-MS/MS proteomics applications. Initially we digested a HEK293 cellular lysate with Lys-N and, for comparison, in parallel with the protease Lys-C. The resulting peptides were separated by strong cation exchange to enrich and isolate peptides containing a single N-terminal lysine. MALDI-MS/MS analysis of these peptides yielded CID spectra with clear and often complete sequence ladders of b-ions. To test the applicability for de novo sequencing we next separated an ostrich muscle tissue protein lysate by one-dimensional SDS-PAGE. A protein band at 42 kDa was in-gel digested with Lys-N. Relatively straightforward sequencing resulted in the de novo identification of the two ostrich proteins creatine kinase and actin. We therefore conclude that this method that combines Lys-N, strong cation exchange enrichment, and MALDI-MS/MS analysis provides a valuable alternative proteomics strategy.
Project description:De novo peptide sequencing is a fundamental research area in mass spectrometry (MS) based proteomics. However, those methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) and the significance of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate their FDRs. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data. Our FDR estimation also shows that some tools may perform better than the others in distinguishing between de novo PSMs and random matches, and can be used to assess the significance of de novo PSMs.