Project description:Metabolomics is the science of characterizing and quantifying small molecule metabolites in biological systems. These metabolites give organisms their biochemical characteristics, providing a link between genotype, environment, and phenotype. With these opportunities also come data challenges, such as compound annotation, missing values, and batch effects. We present the steps of a general pipeline to process untargeted mass spectrometry data to alleviate the latter two challenges. We assume to have a matrix with metabolite abundances, with metabolites in rows and samples in columns. The steps in the pipeline include summarizing technical replicates (if available), filtering, imputing, transforming, and normalizing the data. In each of these steps, a method and parameters should be chosen based on assumptions one is willing to make, the question of interest, and diagnostic tools. Besides giving a general pipeline that can be adapted by the reader, our goal is to review diagnostic tools and criteria that are helpful when making decisions in each step of the pipeline and assessing the effectiveness of normalization and batch correction. We conclude by giving a list of useful packages and discuss some alternative approaches that might be more appropriate for the reader's data.
Project description:A Matlab-based computer program termed Discovery of General Endo- and Xenobiotics (DoGEX) was developed, which uses wavelets and morphological analysis to process liquid chromatography-mass spectrometry (LC-MS) data. The output of the program is a list of integration areas as a function of retention time and molecular mass. A feature of the computer program is spectral filtering to facilitate the detection of chromatographic peaks with a particular isotopic ratio. The program DoGEX was used to automatically select oxidation products formed from felodipine (i.e., two chlorine atoms) and bromocriptine (one bromine atom) with cytochrome P450 3A4. The recognized isotope ratio can be changed to permit a natural or artificial mixture of isotopes to be monitored for selections. This computer program can be used to analyze LC-MS data for untargeted metabolic profiling experiments, e.g., to assign endogenous functions to newly characterized cytochrome P450 enzymes. In a representative example, an incubation of testosterone, NADPH, and a 1:1 16O2/18O2 mixture yielded products with M and M+2 ions resembling bromine doublets. Another use of the program is the subtraction of one set of tR, m/z data from another, e.g., in comparisons of changes in patterns during enzyme reactions.
Project description:MotivationUntargeted mass spectrometry experiments enable the profiling of metabolites in complex biological samples. The collected fragmentation spectra are the metabolite's fingerprints that are used for molecule identification and discovery. Two main mass spectrometry strategies exist for the collection of fragmentation spectra: data-dependent acquisition (DDA) and data-independent acquisition (DIA). In the DIA strategy, all the metabolites ions in predefined mass-to-charge ratio ranges are co-isolated and co-fragmented, resulting in multiplexed fragmentation spectra that are challenging to annotate. In contrast, in the DDA strategy, fragmentation spectra are dynamically and specifically collected for the most abundant ions observed, causing redundancy and sub-optimal fragmentation spectra collection. Yet, DDA results in less multiplexed fragmentation spectra that can be readily annotated.ResultsWe introduce the MS2Planner workflow, an Iterative Optimized Data Acquisition strategy that optimizes the number of high-quality fragmentation spectra over multiple experimental acquisitions using topological sorting. Our results showed that MS2Planner increases the annotation rate by 38.6% and is 62.5% more sensitive and 9.4% more specific compared to DDA.Availability and implementationMS2Planner code is available at https://github.com/mohimanilab/MS2Planner. The generation of the inclusion list from MS2Planner was performed with python scripts available at https://github.com/lfnothias/IODA_MS.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundLipids are ubiquitous and serve numerous biological functions; thus lipids have been shown to have great potential as candidates for elucidating biomarkers and pathway perturbations associated with disease. Methods expanding coverage of the lipidome increase the likelihood of biomarker discovery and could lead to more comprehensive understanding of disease etiology.ResultsWe introduce LipidMatch, an R-based tool for lipid identification for liquid chromatography tandem mass spectrometry workflows. LipidMatch currently has over 250,000 lipid species spanning 56 lipid types contained in in silico fragmentation libraries. Unique fragmentation libraries, compared to other open source software, include oxidized lipids, bile acids, sphingosines, and previously uncharacterized adducts, including ammoniated cardiolipins. LipidMatch uses rule-based identification. For each lipid type, the user can select which fragments must be observed for identification. Rule-based identification allows for correct annotation of lipids based on the fragments observed, unlike typical identification based solely on spectral similarity scores, where over-reporting structural details that are not conferred by fragmentation data is common. Another unique feature of LipidMatch is ranking lipid identifications for a given feature by the sum of fragment intensities. For each lipid candidate, the intensities of experimental fragments with exact mass matches to expected in silico fragments are summed. The lipid identifications with the greatest summed intensity using this ranking algorithm were comparable to other lipid identification software annotations, MS-DIAL and Greazy. For example, for features with identifications from all 3 software, 92% of LipidMatch identifications by fatty acyl constituents were corroborated by at least one other software in positive mode and 98% in negative ion mode.ConclusionsLipidMatch allows users to annotate lipids across a wide range of high resolution tandem mass spectrometry experiments, including imaging experiments, direct infusion experiments, and experiments employing liquid chromatography. LipidMatch leverages the most extensive in silico fragmentation libraries of freely available software. When integrated into a larger lipidomics workflow, LipidMatch may increase the probability of finding lipid-based biomarkers and determining etiology of disease by covering a greater portion of the lipidome and using annotation which does not over-report biologically relevant structural details of identified lipid molecules.
Project description:Matrix-assisted laser desorption/ionization mass spectrometry imaging allows for the study of metabolic activity in the tumor microenvironment of brain cancers. The detectable metabolites within these tumors are contingent upon the choice of matrix, deposition technique, and polarity setting. In this study, we compared the performance of three different matrices, two deposition techniques, and the use of positive and negative polarity in two different brain cancer types and across two species. Optimal combinations were confirmed by a comparative analysis of lipid and small-molecule abundance by using liquid chromatography-mass spectrometry and RNA sequencing to assess differential metabolites and enzymes between normal and tumor regions. Our findings indicate that in the tumor-bearing brain, the recrystallized α-cyano-4-hydroxycinnamic acid matrix with positive polarity offered superior performance for both detected metabolites and consistency with other techniques. Beyond these implications for brain cancer, our work establishes a workflow to identify optimal matrices for spatial metabolomics studies.
Project description:MotivationSurface-enhanced laser desorption and ionization (SELDI) time of flight (TOF) is a mass spectrometry technology. The key features in a mass spectrum are its peaks. In order to locate the peaks and quantify their intensities, several pre-processing steps are required. Though different approaches to perform pre-processing have been proposed, there is no systematic study that compares their performance.ResultsIn this article, we present the results of a systematic comparison of various popular packages for pre-processing of SELDI-TOF data. We evaluate their performance in terms of two of their primary functions: peak detection and peak quantification. Regarding peak quantification, the performance of the algorithms is measured in terms of reproducibility. For peak detection, the comparison is based on sensitivity and false discovery rate. Our results show that for spectra generated with low laser intensity, the software developed by Ciphergen Biosystems (ProteinChip Software 3.1 with the additional tool Biomarker Wizard) produces relatively good results for both peak quantification and detection. On the other hand, for the data produced with either medium or high laser intensity, none of the methods show uniformly better performances under both criteria. Our analysis suggests that an advantageous combination is the use of the packages MassSpecWavelet and PROcess, the former for peak detection and the latter for peak quantification.
Project description:BackgroundIn mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorithms is yet available. The main objective of this paper is to provide such a survey and to compare the performance of single spectrum based peak detection methods.ResultsIn general, we can decompose a peak detection procedure into three consequent parts: smoothing, baseline correction and peak finding. We first categorize existing peak detection algorithms according to the techniques used in different phases. Such a categorization reveals the differences and similarities among existing peak detection algorithms. Then, we choose five typical peak detection algorithms to conduct a comprehensive experimental study using both simulation data and real MALDI MS data.ConclusionThe results of comparison show that the continuous wavelet-based algorithm provides the best average performance.
Project description:Many solutes have been reported to remain at higher plasma levels relative to normal than the standard index solute urea in hemodialysis patients. Untargeted mass spectrometry was employed to compare solute levels in plasma and plasma ultrafiltrate of hemodialysis patients and normal subjects. Quantitative assays were employed to check the accuracy of untargeted results for selected solutes and additional measurements were made in dialysate and urine to estimate solute clearances and production. Comparison of peak areas indicated that many solutes accumulated to high levels in hemodialysis patients, with average peak areas in plasma ultrafiltrate of dialysis patients being more than 100 times greater than those in normals for 123 features. Most of these mass spectrometric features were identified only by their mass values. Untargeted analysis correctly ranked the accumulation of 5 solutes which were quantitatively assayed but tended to overestimate its extent. Mathematical modeling showed that the elevation of plasma levels for these solutes could be accounted for by a low dialytic to native kidney clearance ratio and a high dialytic clearance relative to the volume of the accessible compartment. Numerous solutes accumulate to high levels in hemodialysis patients because dialysis does not replicate the clearance provided by the native kidney. Many of these solutes remain to be chemically identified and their pathogenic potential elucidated.
Project description:The broad coverage of untargeted metabolomics poses fundamental challenges for the harmonization of measurements along time, even if they originate from the very same instrument. Internal isotopic standards can hardly cover the chemical complexity of study samples. Therefore, they are insufficient for normalizing data a posteriori as done for targeted metabolomics. Instead, it is crucial to verify instrument's performance a priori, that is, before samples are injected. Here, we propose a system suitability testing platform for time-of-flight mass spectrometers independent of liquid chromatography. It includes a chemically defined quality control mixture, a fast acquisition method, software for extracting ca. 3,000 numerical features from profile data, and a simple web service for monitoring. We ran a pilot for 21 months and present illustrative results for anomaly detection or learning causal relationships between the spectral features and machine settings. Beyond mere detection of anomalies, our results highlight several future applications such as 1) recommending instrument retuning strategies to achieve desired values of quality indicators, 2) driving preventive maintenance, and 3) using the obtained, detailed spectral features for posterior data harmonization.
Project description:The diverse characteristics and large number of entities make metabolite separation challenging in metabolomics. To date, there is not a singular instrument capable of analyzing all types of metabolites. In order to achieve a better separation for higher peak capacity and accurate metabolite identification and quantification, we integrated GC × GC-MS and parallel 2DLC-MS for analysis of polar metabolites. To test the performance of the developed system, 13 rats were fed different diets to form two animal groups. Polar metabolites extracted from rat livers were analyzed by GC × GC-MS, parallel 2DLC-MS (-) and parallel 2DLC-MS (+), respectively. By integrating all data together, 58 metabolites were detected with significant change in their abundance levels between groups (p? 0.05). Of the 58 metabolites, three metabolites were detected in two platforms and two in all three platforms. Manual examination showed that discrepancy of metabolite regulation measured by different platforms was mainly caused by the poor shape of chromatographic peaks resulting from low instrument response. Pathway analysis demonstrated that integrating the results from multiple platforms increased the confidence of metabolic pathway assignment.