Browse
Submit Data
Databases
API
Help

Dataset Information

25 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Pure Ion Chromatograms Combined with Advanced Machine Learning Methods Improve Accuracy of Discriminant Models in LC-MS-Based Untargeted Metabolomics.

ABSTRACT: Untargeted metabolomics based on liquid chromatography coupled with mass spectrometry (LC-MS) can detect thousands of features in samples and produce highly complex datasets. The accurate extraction of meaningful features and the building of discriminant models are two crucial steps in the data analysis pipeline of untargeted metabolomics. In this study, pure ion chromatograms were extracted from a liquor dataset and left-sided colon cancer (LCC) dataset by K-means-clustering-based Pure Ion Chromatogram extraction method version 2.0 (KPIC2). Then, the nonlinear low-dimensional embedding by uniform manifold approximation and projection (UMAP) showed the separation of samples from different groups in reduced dimensions. The discriminant models were established by extreme gradient boosting (XGBoost) based on the features extracted by KPIC2. Results showed that features extracted by KPIC2 achieved 100% classification accuracy on the test sets of the liquor dataset and the LCC dataset, which demonstrated the rationality of the XGBoost model based on KPIC2 compared with the results of XCMS (92% and 96% for liquor and LCC datasets respectively). Finally, XGBoost can achieve better performance than the linear method and traditional nonlinear modeling methods on these datasets. UMAP and XGBoost are integrated into KPIC2 package to extend its performance in complex situations, which are not only able to effectively process nonlinear dataset but also can greatly improve the accuracy of data analysis in non-target metabolomics.

SUBMITTER: Tian M

PROVIDER: S-EPMC8125400 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Automated Annotation of Untargeted All-Ion Fragmentation LC-MS Metabolomics Data with MetaboAnnotatoR.

Project description:Untargeted metabolomics and lipidomics LC-MS experiments produce complex datasets, usually containing tens of thousands of features from thousands of metabolites whose annotation requires additional MS/MS experiments and expert knowledge. All-ion fragmentation (AIF) LC-MS/MS acquisition provides fragmentation data at no additional experimental time cost. However, analysis of such datasets requires reconstruction of parent-fragment relationships and annotation of the resulting pseudo-MS/MS spectra. Here, we propose a novel approach for automated annotation of isotopologues, adducts, and in-source fragments from AIF LC-MS datasets by combining correlation-based parent-fragment linking with molecular fragment matching. Our workflow focuses on a subset of features rather than trying to annotate the full dataset, saving time and simplifying the process. We demonstrate the workflow in three human serum datasets containing 599 features manually annotated by experts. Precision and recall values of 82-92% and 82-85%, respectively, were obtained for features found in the highest-rank scores (1-5). These results equal or outperform those obtained using MS-DIAL software, the current state of the art for AIF data annotation. Further validation for other biological matrices and different instrument types showed variable precision (60-89%) and recall (10-88%) particularly for datasets dominated by nonlipid metabolites. The workflow is freely available as an open-source R package, MetaboAnnotatoR, together with the fragment libraries from Github (https://github.com/gggraca/MetaboAnnotatoR).

| S-EPMC8892435 | biostudies-literature

Filtering procedures for untargeted LC-MS metabolomics data.

Project description:BackgroundUntargeted metabolomics datasets contain large proportions of uninformative features that can impede subsequent statistical analysis such as biomarker discovery and metabolic pathway analysis. Thus, there is a need for versatile and data-adaptive methods for filtering data prior to investigating the underlying biological phenomena. Here, we propose a data-adaptive pipeline for filtering metabolomics data that are generated by liquid chromatography-mass spectrometry (LC-MS) platforms. Our data-adaptive pipeline includes novel methods for filtering features based on blank samples, proportions of missing values, and estimated intra-class correlation coefficients.ResultsUsing metabolomics datasets that were generated in our laboratory from samples of human blood, as well as two public LC-MS datasets, we compared our data-adaptive filtering method with traditional methods that rely on non-method specific thresholds. The data-adaptive approach outperformed traditional approaches in terms of removing noisy features and retaining high quality, biologically informative ones. The R code for running the data-adaptive filtering method is provided at https://github.com/courtneyschiffman/Metabolomics-Filtering .ConclusionsOur proposed data-adaptive filtering pipeline is intuitive and effectively removes uninformative features from untargeted metabolomics datasets. It is particularly relevant for interrogation of biological phenomena in data derived from complex matrices associated with biospecimens.

| S-EPMC6570933 | biostudies-literature

Network Marker Selection for Untargeted LC-MS Metabolomics Data.

Project description:Untargeted metabolomics using high-resolution liquid chromatography-mass spectrometry (LC-MS) is becoming one of the major areas of high-throughput biology. Functional analysis, that is, analyzing the data based on metabolic pathways or the genome-scale metabolic network, is critical in feature selection and interpretation of metabolomics data. One of the main challenges in the functional analyses is the lack of the feature identity in the LC-MS data itself. By matching mass-to-charge ratio (m/z) values of the features to theoretical values derived from known metabolites, some features can be matched to one or more known metabolites. When multiple matchings occur, in most cases only one of the matchings can be true. At the same time, some known metabolites are missing in the measurements. Current network/pathway analysis methods ignore the uncertainty in metabolite identification and the missing observations, which could lead to errors in the selection of significant subnetworks/pathways. In this paper, we propose a flexible network feature selection framework that combines metabolomics data with the genome-scale metabolic network. The method adopts a sequential feature screening procedure and machine learning-based criteria to select important subnetworks and identify the optimal feature matching simultaneously. Simulation studies show that the proposed method has a much higher sensitivity than the commonly used maximal matching approach. For demonstration, we apply the method on a cohort of healthy subjects to detect subnetworks associated with the body mass index (BMI). The method identifies several subnetworks that are supported by the current literature, as well as detects some subnetworks with plausible new functional implications. The R code is available at http://web1.sph.emory.edu/users/tyu8/MSS.

| S-EPMC5441461 | biostudies-literature

Deep annotation of untargeted LC-MS metabolomics data with Binner.

Project description:MotivationWhen metabolites are analyzed by electrospray ionization (ESI)-mass spectrometry, they are usually detected as multiple ion species due to the presence of isotopes, adducts and in-source fragments. The signals generated by these degenerate features (along with contaminants and other chemical noise) obscure meaningful patterns in MS data, complicating both compound identification and downstream statistical analysis. To address this problem, we developed Binner, a new tool for the discovery and elimination of many degenerate feature signals typically present in untargeted ESI-LC-MS metabolomics data.ResultsBinner generates feature annotations and provides tools to help users visualize informative feature relationships that can further elucidate the underlying structure of the data. To demonstrate the utility of Binner and to evaluate its performance, we analyzed data from reversed phase LC-MS and hydrophilic interaction chromatography (HILIC) platforms and demonstrated the accuracy of selected annotations using MS/MS. When we compared Binner annotations of 75 compounds previously identified in human plasma samples with annotations generated by three similar tools, we found that Binner achieves superior performance in the number and accuracy of annotations while simultaneously minimizing the number of incorrectly annotated principal ions. Data reduction and pattern exploration with Binner have allowed us to catalog a number of previously unrecognized complex adducts and neutral losses generated during the ionization of molecules in LC-MS. In summary, Binner allows users to explore patterns in their data and to efficiently and accurately eliminate a significant number of the degenerate features typically found in various LC-MS modalities.Availability and implementationBinner is written in Java and is freely available from http://binner.med.umich.edu.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC7828469 | biostudies-literature

Peak Annotation and Verification Engine for Untargeted LC-MS Metabolomics.

Project description:Untargeted metabolomics can detect more than 10 000 peaks in a single LC-MS run. The correspondence between these peaks and metabolites, however, remains unclear. Here, we introduce a Peak Annotation and Verification Engine (PAVE) for annotating untargeted microbial metabolomics data. The workflow involves growing cells in 13C and 15N isotope-labeled media to identify peaks from biological compounds and their carbon and nitrogen atom counts. Improved deisotoping and deadducting are enabled by algorithms that integrate positive mode, negative mode, and labeling data. To distinguish metabolites and their fragments, PAVE experimentally measures the response of each peak to weak in-source collision induced dissociation, which increases the peak intensity for fragments while decreasing it for their parent ions. The molecular formulas of the putative metabolites are then assigned based on database searching using both m/ z and C/N atom counts. Application of this procedure to Saccharomyces cerevisiae and Escherichia coli revealed that more than 80% of peaks do not label, i.e., are environmental contaminants. More than 70% of the biological peaks are isotopic variants, adducts, fragments, or mass spectrometry artifacts yielding ∼2000 apparent metabolites across the two organisms. About 650 match to a known metabolite formula based on m/ z and C/N atom counts, with 220 assigned structures based on MS/MS and/or retention time to match to authenticated standards. Thus, PAVE enables systematic annotation of LC-MS metabolomics data with only ∼4% of peaks annotated as apparent metabolites.

| S-EPMC6501219 | biostudies-literature

MARS: A Multipurpose Software for Untargeted LC-MS-Based Metabolomics and Exposomics.

Project description:Untargeted metabolomics is a growing field, in which recent advances in high-resolution mass spectrometry coupled with liquid chromatography (LC-MS) have facilitated untargeted approaches as a result of improvements in sensitivity, mass accuracy, and resolving power. However, a very large amount of data are generated. Consequently, using computational tools is now mandatory for the in-depth analysis of untargeted metabolomics data. This article describes MetAbolomics ReSearch (MARS), an all-in-one vendor-agnostic graphical user interface-based software applying LC-MS analysis to untargeted metabolomics. All of the analytical steps are described (from instrument data conversion and processing to statistical analysis, annotation/identification, quantification, and preliminary biological interpretation), and tools developed to improve annotation accuracy (e.g., multiple adducts and in-source fragmentation detection, trends across samples, and the MS/MS validator) are highlighted. In addition, MARS allows in-house building of reference databases, to bypass the limits of freely available MS/MS spectra collections. Focusing on the flexibility of the software and its user-friendliness, which are two important features in multipurpose software, MARS could provide new perspectives in untargeted metabolomics data analysis.

| S-EPMC10831794 | biostudies-literature

Untargeted LC-MS/MS analysis reveals metabolomics feature of osteosarcoma stem cell response to methotrexate.

Project description:BackgroundCancer stem cell (CSC) is identified in osteosarcoma (OS) and considered resistant to chemotherapeutic agents. However, the mechanism of osteosarcoma stem cell (OSC) resistant to chemotherapy remains debatable and vague, and the metabolomics feature of OSC is not clarified.Materials and methodsOSC was isolated by using sphere forming assay and identified. Untargeted LC-MS/MS analysis was performed to reveal the metabolomics feature of OSC and underlying mechanisms of OSC resistant to methotrexate (MTX).ResultsOSC was efficiently isolated and identified from human OS 143B and MG63 cell lines with enhanced chemo-resistance to MTX. The untargeted LC-MS analysis revealed that OSC showed differential metabolites and perturbed signaling pathways, mainly involved in metabolisms of fatty acid, amino acid, carbohydrate metabolism and nucleic acid. After treated with MTX, metabolomics feature of OSC was mainly involved metabolisms of amino acid, fatty acid, energy and nucleic acid. Moreover, compared with their parental OS cells response to MTX, the differential metabolites and perturbed signaling pathways were mainly involved in metabolism of amino acid, fatty acid and nucleic acid. What's more, Rap1 signaling pathway and Ras signaling pathway were involved in OS cells and their SCs response to MTX.ConclusionSphere-forming assay was able to efficiently isolate OSC from human OS cell lines and the untargeted LC-MS/MS analysis was suggested a sufficient methodology to investigate metabolomics features of OS cells and OSCs. Moreover, the metabolomics features of OSCs response to MTX might reveal a further understanding of chemotherapeutic resistance in OS.

| S-EPMC7313215 | biostudies-literature

An LC-QToF MS based method for untargeted metabolomics of human fecal samples.

Project description:IntroductionConsensus in sample preparation for untargeted human fecal metabolomics is lacking.ObjectivesTo obtain sample preparation with broad metabolite coverage for high-throughput LC-MS.MethodsExtraction solvent, solvent ratio and fresh frozen-vs-lyophilized samples were evaluated by metabolite feature quality.ResultsMethanol at 5 mL per g wet feces provided a wide metabolite coverage with optimal balance between signal intensity and saturation for both fresh frozen and lyophilized samples. Lyophilization did not affect SCFA and is recommended because of convenience in normalizing to dry matter.ConclusionThe suggested sample preparation is simple, efficient and suitable for large-scale human fecal metabolomics.

| S-EPMC7125068 | biostudies-literature

A Python-Based Pipeline for Preprocessing LC-MS Data for Untargeted Metabolomics Workflows.

Project description:Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography-mass spectrometry (LC-MS) involves the removal of biologically non-relevant features (retention time, m/z pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces TidyMS, a package for the Python programming language for preprocessing LC-MS data for quality control (QC) procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC-MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are shown with pipelines for an LC-MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies in addition to NIST SRM 1950 Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.

| S-EPMC7602939 | biostudies-literature

Untargeted and Targeted LC-MS/MS Based Metabolomics Study on In Vitro Culture of Phaeoacremonium Species.

Project description:Grapevine (Vitis vinifera L.) can be affected by many different biotic agents, including tracheomycotic fungi such as Phaeomoniella chlamydospora and Phaeoacremonium minimum, which are the main causal agent of Esca and Petri diseases. Both fungi produce phytotoxic naphthalenone polyketides, namely scytalone and isosclerone, that are related to symptom development. The main objective of this study was to investigate the secondary metabolites produced by three Phaeoacremonium species and to assess their phytotoxicity by in vitro bioassay. To this aim, untargeted and targeted LC-MS/MS-based metabolomics were performed. High resolution mass spectrometer UHPLC-Orbitrap was used for the untargeted profiling and dereplication of secondary metabolites. A sensitive multi reaction monitoring (MRM) method for the absolute quantification of scytalone and isosclerone was developed on a UPLC-QTrap. Different isolates of P. italicum, P. alvesii and P. rubrigenum were grown in vitro and the culture filtrates and organic extracts were assayed for phytotoxicity. The toxic effects varied within and among fungal isolates. Isosclerone and scytalone were dereplicated by matching retention times and HRMS and MS/MS data with pure standards. The amount of scytalone and isosclerone differed within and among fungal species. To our best knowledge, this is the first study that applies an approach of LC-MS/MS-based metabolomics to investigate differences in the metabolic composition of organic extracts of Phaeoacremonium species culture filtrates.

| S-EPMC8780456 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data