Dataset Information

Construction of confidence regions for isotopic abundance patterns in LC/MS data sets for rigorous determination of molecular formulas.

ABSTRACT: It has long been recognized that estimates of isotopic abundance patterns may be instrumental in identifying the many unknown compounds encountered when conducting untargeted metabolic profiling using liquid chromatography/mass spectrometry. While numerous methods have been developed for assigning heuristic scores to rank the degree of fit of the observed abundance patterns with theoretical ones, little work has been done to quantify the errors that are associated with the measurements made. Thus, it is generally not possible to determine, in a statistically meaningful manner, whether a given chemical formula would likely be capable of producing the observed data. In this paper, we present a method for constructing confidence regions for the isotopic abundance patterns based on the fundamental distribution of the ion arrivals. Moreover, we develop a method for doing so that makes use of the information pooled together from the measurements obtained across an entire chromatographic peak, as well as from any adducts, dimers, and fragments observed in the mass spectra. This greatly increases the statistical power, thus enabling the analyst to rule out a potentially much larger number of candidate formulas while explicitly guarding against false positives. In practice, small departures from the model assumptions are possible due to detector saturation and interferences between adjacent isotopologues. While these factors form impediments to statistical rigor, they can to a large extent be overcome by restricting the analysis to moderate ion counts and by applying robust statistical methods. Using real metabolic data, we demonstrate that the method is capable of reducing the number of candidate formulas by a substantial amount, even when no bromine or chlorine atoms are present. We argue that further developments in our ability to characterize the data mathematically could enable much more powerful statistical analyses.

SUBMITTER: Ipsen A

PROVIDER: S-EPMC2930401 | biostudies-other | 2010 Sep

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:Despite immense interest in the proteome as a source of biomarkers in cancer, mass spectrometry has yet to yield a clinically useful protein biomarker for tumor classification. To explore the potential of a particular class of mass spectrometry-based quantitation approaches, label-free alignment of liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data sets, for the identification of biomarkers for acute leukemias, we asked whether a label-free alignment algorithm could distinguish known classes of leukemias on the basis of their proteomes. This approach to quantitation involves (1) computational alignment of MS1 peptide peaks across large numbers of samples; (2) measurement of the relative abundance of peptides across samples by integrating the area under the curve of the MS1 peaks; and (3) assignment of peptide IDs to those quantified peptide peaks on the basis of the corresponding MS2 spectra. We extracted proteins from blasts derived from four patients with acute myeloid leukemia (AML, acute leukemia of myeloid lineage) and five patients with acute lymphoid leukemia (ALL, acute leukemia of lymphoid lineage). Mobilized CD34+ cells purified from peripheral blood of six healthy donors and mononuclear cells (MNC) from the peripheral blood of two healthy donors were used as healthy controls. Proteins were analyzed by LC-MS/MS and quantified with a label-free alignment-based algorithm developed in our laboratory. Unsupervised hierarchical clustering of blinded samples separated the samples according to their known biological characteristics, with each sample group forming a discrete cluster. The four proteins best able to distinguish CD34+, AML, and ALL were all either known biomarkers or proteins whose biological functions are consistent with their ability to distinguish these classes. We conclude that alignment-based label-free quantitation of LC-MS/MS data sets can, at least in some cases, robustly distinguish known classes of leukemias, thus opening the possibility that large scale studies using such algorithms can lead to the identification of clinically useful biomarkers.

Dataset Information

Construction of confidence regions for isotopic abundance patterns in LC/MS data sets for rigorous determination of molecular formulas.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets