Project description:Data quality in global metabolomics is of great importance for biomarker discovery and system biology studies. However, comprehensive metrics and methods to evaluate and compare the data quality of global metabolomics data sets are lacking. In this work, we combine newly developed metrics, along with well-known measures, to comprehensively and quantitatively characterize the data quality across two similar liquid chromatography coupled with mass spectrometry (LC-MS) platforms, with the goal of providing an efficient and improved ability to evaluate the data quality in global metabolite profiling experiments. A pooled human serum sample was run 50 times on two high-resolution LC-QTOF-MS platforms to provide profile and centroid MS data. These data were processed using Progenesis QI software and then analyzed using five important data quality measures, including retention time drift, the number of compounds detected, missing values, and MS reproducibility (2 measures). The detected compounds were fit to a γ distribution versus compound abundance, which was normalized to allow comparison of different platforms. To evaluate missing values, characteristic curves were obtained by plotting the compound detection percentage versus extraction frequency. To characterize reproducibility, the accumulative coefficient of variation (CV) versus the percentage of total compounds detected and intraclass correlation coefficient (ICC) versus compound abundance were investigated. Key findings include significantly better performance using profile mode data compared to centroid mode as well quantitatively better performance from the newer, higher resolution instrument. A summary table of results gives a snapshot of the experimental results and provides a template to evaluate the global metabolite profiling workflow. In total, these measures give a good overall view of data quality in global profiling and allow comparisons of data acquisition strategies and platforms as well as optimization of parameters.
Project description:MotivationIn the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values.ResultsWe describe a novel multivariate statistical strategy for the identification of LC-MS runs with extreme peptide abundance distributions. Comparison with current method (run-by-run correlation) demonstrates a significantly better rate of identification of outlier runs by the multivariate strategy. Simulation studies also suggest that this strategy significantly outperforms correlation alone in the identification of statistically extreme liquid chromatography-mass spectrometry (LC-MS) runs.Availabilityhttps://www.biopilot.org/docs/Software/RMD.phpContactbj@pnl.govSupplementary informationSupplementary material is available at Bioinformatics online.
Project description:A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets.
Project description:Blood microsampling (BμS) has recently emerged as an interesting approach in the analysis of endogenous metabolites but also in metabolomics applications. Their non-invasive way of use and the simplified logistics that they offer renders these technologies highly attractive in large-scale studies, especially the novel quantitative microsampling approaches such as VAMs or qDBS. Objectives: Herein, we investigate the potential of BµS devices compared to the conventional plasma samples used in global untargeted mass spectrometry-based metabolomics of blood. Methods: Two novel quantitative devices, namely, Mitra, Capitainer, and the widely used Whatman cards, were selected for comparison with plasma. Venous blood was collected from 10 healthy, overnight-fasted individuals and loaded on the devices; plasma was also collected from the same venous blood. An extraction solvent optimization study was first performed on the three devices before the main study, which compared the global metabolic profiles of the four extracts (three BµS devices and plasma). Analysis was conducted using reverse phase LC-TOF MS in positive mode. Results: BµS devices, especially Mitra and Capitainer, provided equal or even superior information on the metabolic profiling of human blood based on the number and intensity of features and the precision and stability of some annotated metabolites compared to plasma. Despite their rich metabolic profiles, BµS did not capture metabolites associated with biological differentiation of sexes. Conclusions: Overall, our results suggest that a more in-depth investigation of the acquired information is needed for each specific application, as a metabolite-dependent trend was obvious.
Project description:BackgroundIt is possible to identify thousands of phosphopeptides and -proteins in a single experiment with mass spectrometry-based phosphoproteomics. However, a current bottleneck is the downstream data analysis which is often laborious and requires a number of manual steps.ResultsToward automating the analysis steps, we have developed and implemented a software, PhosFox, which enables peptide-level processing of phosphoproteomic data generated by multiple protein identification search algorithms, including Mascot, Sequest, and Paragon, as well as cross-comparison of their identification results. The software supports both qualitative and quantitative phosphoproteomics studies, as well as multiple between-group comparisons. Importantly, PhosFox detects uniquely phosphorylated peptides and proteins in one sample compared to another. It also distinguishes differences in phosphorylation sites between phosphorylated proteins in different samples. Using two case study examples, a qualitative phosphoproteome dataset from human keratinocytes and a quantitative phosphoproteome dataset from rat kidney inner medulla, we demonstrate here how PhosFox facilitates an efficient and in-depth phosphoproteome data analysis. PhosFox was implemented in the Perl programming language and it can be run on most common operating systems. Due to its flexible interface and open source distribution, the users can easily incorporate the program into their MS data analysis workflows and extend the program with new features. PhosFox source code, implementation and user instructions are freely available from https://bitbucket.org/phintsan/phosfox.ConclusionsPhosFox facilitates efficient and more in-depth comparisons between phosphoproteins in case-control settings. The open source implementation is easily extendable to accommodate additional features for widespread application use cases.
Project description:Despite immense interest in the proteome as a source of biomarkers in cancer, mass spectrometry has yet to yield a clinically useful protein biomarker for tumor classification. To explore the potential of a particular class of mass spectrometry-based quantitation approaches, label-free alignment of liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data sets, for the identification of biomarkers for acute leukemias, we asked whether a label-free alignment algorithm could distinguish known classes of leukemias on the basis of their proteomes. This approach to quantitation involves (1) computational alignment of MS1 peptide peaks across large numbers of samples; (2) measurement of the relative abundance of peptides across samples by integrating the area under the curve of the MS1 peaks; and (3) assignment of peptide IDs to those quantified peptide peaks on the basis of the corresponding MS2 spectra. We extracted proteins from blasts derived from four patients with acute myeloid leukemia (AML, acute leukemia of myeloid lineage) and five patients with acute lymphoid leukemia (ALL, acute leukemia of lymphoid lineage). Mobilized CD34+ cells purified from peripheral blood of six healthy donors and mononuclear cells (MNC) from the peripheral blood of two healthy donors were used as healthy controls. Proteins were analyzed by LC-MS/MS and quantified with a label-free alignment-based algorithm developed in our laboratory. Unsupervised hierarchical clustering of blinded samples separated the samples according to their known biological characteristics, with each sample group forming a discrete cluster. The four proteins best able to distinguish CD34+, AML, and ALL were all either known biomarkers or proteins whose biological functions are consistent with their ability to distinguish these classes. We conclude that alignment-based label-free quantitation of LC-MS/MS data sets can, at least in some cases, robustly distinguish known classes of leukemias, thus opening the possibility that large scale studies using such algorithms can lead to the identification of clinically useful biomarkers.
Project description:SummaryWe introduce an open-source software, LIQUID, for semi-automated processing and visualization of LC-MS/MS-based lipidomics data. LIQUID provides users with the capability to process high throughput data and contains a customizable target library and scoring model per project needs. The graphical user interface provides visualization of multiple lines of spectral evidence for each lipid identification, allowing rapid examination of data for making confident identifications of lipid molecular species. LIQUID was compared to other freely available software commonly used to identify lipids and other small molecules (e.g. CFM-ID, MetFrag, GNPS, LipidBlast and MS-DIAL), and was found to have a faster processing time to arrive at a higher number of validated lipid identifications.Availability and implementationLIQUID is available at http://github.com/PNNL-Comp-Mass-Spec/LIQUID .Contactjennifer.kyle@pnnl.gov or thomas.metz@pnnl.gov.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundGlobal or untargeted metabolomics is widely used to comprehensively investigate metabolic profiles under various pathophysiological conditions such as inflammations, infections, responses to exposures or interactions with microbial communities. However, biological interpretation of global metabolomics data remains a daunting task. Recent years have seen growing applications of pathway enrichment analysis based on putative annotations of liquid chromatography coupled with mass spectrometry (LC-MS) peaks for functional interpretation of LC-MS-based global metabolomics data. However, due to intricate peak-metabolite and metabolite-pathway relationships, considerable variations are observed among results obtained using different approaches. There is an urgent need to benchmark these approaches to inform the best practices.ResultsWe have conducted a benchmark study of common peak annotation approaches and pathway enrichment methods in current metabolomics studies. Representative approaches, including three peak annotation methods and four enrichment methods, were selected and benchmarked under different scenarios. Based on the results, we have provided a set of recommendations regarding peak annotation, ranking metrics and feature selection. The overall better performance was obtained for the mummichog approach. We have observed that a ~30% annotation rate is sufficient to achieve high recall (~90% based on mummichog), and using semi-annotated data improves functional interpretation. Based on the current platforms and enrichment methods, we further propose an identifiability index to indicate the possibility of a pathway being reliably identified. Finally, we evaluated all methods using 11 COVID-19 and 8 inflammatory bowel diseases (IBD) global metabolomics datasets.
Project description:Protein phosphorylation plays key roles in a variety of essential cellular processes. Fasciola gigantica is a tropical liver fluke causing hepatobiliary disease fascioliasis, leading to human health threats and heavy economic losses. Although the genome and protein kinases of F. gigantica provided new insights to understand the molecular biology and etiology of this parasite, there is scant knowledge of protein phosphorylation events in F. gigantica. In this study, we characterized the global phosphoproteomics of adult F. gigantica by phosphopeptide enrichment-based LC-MS/MS, a high-throughput analysis to maximize the detection of a large repertoire of phosphoproteins and phosphosites. A total of 1030 phosphopeptides with 1244 phosphosites representing 635 F. gigantica phosphoproteins were identified. The phosphoproteins were involved in a wide variety of biological processes including cellular, metabolic, and single-organism processes. Meanwhile, these proteins were found predominantly in cellular components like membranes and organelles with molecular functions of binding (51.3%) and catalytic activity (40.6%). The KEGG annotation inferred that the most enriched pathways of the phosphoproteins included tight junction, spliceosome, and RNA transport (each one contains 15 identified proteins). Combining the reports in other protozoa and helminths, the phosphoproteins identified in this work play roles in metabolic regulation and signal transduction. To our knowledge, this work performed the first global phosphoproteomics analysis of adult F. gigantica, which provides valuable information for development of intervention strategies for fascioliasis.