Dataset Information

Construction of Human Proteoform Families from 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Top-Down Proteomic Data.

ABSTRACT: Identification of proteoforms, the different forms of a protein, is important to understand biological processes. A proteoform family is the set of different proteoforms from the same gene. We previously developed the software program Proteoform Suite, which constructs proteoform families and identifies proteoforms by intact-mass analysis. Here, we have applied this approach to top-down proteomic data acquired at the National High Magnetic Field Laboratory 21 tesla Fourier transform ion cyclotron resonance mass spectrometer (data available on the MassIVE platform with identifier MSV000085978). We explored the ability to construct proteoform families and identify proteoforms from the high mass accuracy data that this instrument provides for a complex cell lysate sample from the MCF-7 human breast cancer cell line. There were 2830 observed experimental proteforms, of which 932 were identified, 44 were ambiguous, and 1854 were unidentified. Of the 932 unique identified proteoforms, 766 were identified by top-down MS2 analysis at 1% false discovery rate (FDR) using TDPortal, and 166 were additional intact-mass identifications (∼4.7% calculated global FDR) made using Proteoform Suite. We recently published a proteoform level schema to represent ambiguity in proteoform identifications. We implemented this proteoform level classification in Proteoform Suite for intact-mass identifications, which enables users to determine the ambiguity levels and sources of ambiguity for each intact-mass proteoform identification.

SUBMITTER: Schaffer LV

PROVIDER: S-EPMC7775878 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Construction of Human Proteoform Families from 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Top-Down Proteomic Data.

Schaffer Leah V LV Anderson Lissa C LC Butcher David S DS Shortreed Michael R MR Miller Rachel M RM Pavelec Caitlin C Smith Lloyd M LM

Journal of proteome research 20201019 1

Identification of proteoforms, the different forms of a protein, is important to understand biological processes. A proteoform family is the set of different proteoforms from the same gene. We previously developed the software program Proteoform Suite, which constructs proteoform families and identifies proteoforms by intact-mass analysis. Here, we have applied this approach to top-down proteomic data acquired at the National High Magnetic Field Laboratory 21 tesla Fourier transform ion cyclotro ...[more]

PMID: 33074679

Similar Datasets

Project description:Mass spectrometry (MS) based top-down proteomics provides rich information about proteoforms arising from combinatorial amino acid sequence variations and post-translational modifications (PTMs). Fourier transform ion cyclotron resonance (FT-ICR) MS affords ultrahigh resolving power and provides high-accuracy mass measurements, presenting a powerful tool for top-down MS characterization of proteoforms. However, the detection and characterization of large proteins from complex mixtures remain challenging due to the exponential decrease in S: N with increasing molecular weight (MW) and coeluting low-MW proteins; thus, size-based fractionation of complex protein mixtures prior to MS analysis is necessary. Here, we directly combine MS-compatible serial size exclusion chromatography (sSEC) fractionation with 12 T FT-ICR MS for targeted top-down characterization of proteins from complex mixtures extracted from human and swine heart tissue. Benefiting from the ultrahigh resolving power of FT-ICR, we isotopically resolved 31 distinct proteoforms (30-50 kDa) simultaneously in a single mass spectrum within a 100 m/ z window. Notably, within a 5 m/ z window, we obtained baseline isotopic resolution for 6 distinct large proteoforms (30-50 kDa). The ultrahigh resolving power of FT-ICR MS combined with sSEC fractionation enabled targeted top-down analysis of large proteoforms (>30 kDa) from the human heart proteome without extensive chromatographic separation or protein purification. Further separation of proteoforms inside the mass spectrometer (in-MS) allowed for isolation of individual proteoforms and targeted electron capture dissociation (ECD), yielding high sequence coverage. sSEC/FT-ICR ECD facilitated the identification and sequence characterization of important metabolic enzymes. This platform, which facilitates deep interrogation of proteoform primary structure, is highly tunable, allows for adjustment of MS and MS/MS parameters in real time, and can be utilized for a variety of complex protein mixtures.

Project description:Calmodulin (CaM) is a highly conserved, ubiquitous, calcium-binding protein; it binds to and regulates many different protein targets, thereby functioning as a calcium sensor and signal transducer. CaM contains 9 methionine (Met), 1 histidine (His), 17 aspartic acid (Asp), and 23 glutamine acid (Glu) residues, all of which can potentially react with platinum compounds; thus, one-third of the CaM sequence is a possible binding target of platinum anticancer drugs, which represents a major challenge for identification of specific platinum modification sites. Here, top-down electron capture dissociation (ECD) was used to elucidate the transition metal-platinum(II) modification sites. By using a combination of top-down and bottom-up mass spectrometric (MS) approaches, 10 specific binding sites for mononuclear complexes, cisplatin and [Pt(dien)Cl]Cl, and dinuclear complex [{cis-PtCl(2)(NH(3))}(2)(μ-NH(2)(CH(2))(4)NH(2))] on CaM were identified. High resolution MS of cisplatin-modified CaM revealed that cisplatin mainly targets Met residues in solution at low molar ratios of cisplatin-CaM (2:1), by cross-linking Met residues. At a high molar ratio of cisplatin:CaM (8:1), up to 10 platinum(II) bind to Met, Asp, and Glu residues. [{cis-PtCl(2)(NH(3))}(2)(μ-NH(2)(CH(2))(4)NH(2))] forms mononuclear adducts with CaM. The alkanediamine linker between the two platinum centers dissociates due to a trans-labilization effect. [Pt(dien)Cl]Cl forms {Pt(dien)}(2+) adducts with CaM, and the preferential binding sites were identified as Met51, Met71, Met72, His107, Met109, Met124, Met144, Met145, Glu45 or Glu47, and Asp122 or Glu123. The binding of these complexes to CaM, particularly when binding involves loss of all four original ligands, is largely irreversible which could result in their failure to reach the target DNA or be responsible for unwanted side-effects during chemotherapy. Additionally, the cross-linking of cisplatin to CaM might lead to the loss of the biological function of CaM or CaM-Ca(2+) due to limiting the flexibility of the CaM or CaM-Ca(2+) complex to recognize target proteins or blocking the binding region of target proteins to CaM.

Dataset Information

Construction of Human Proteoform Families from 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Top-Down Proteomic Data.

Publications

Construction of Human Proteoform Families from 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Top-Down Proteomic Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets