Dataset Information

A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures.

ABSTRACT: Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15 i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.

SUBMITTER: Zhang J

PROVIDER: S-EPMC3430127 | biostudies-literature | 2012 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures.

Zhang Jun J Koo Imhoi I Wang Bing B Gao Qing-Wei QW Zheng Chun-Hou CH Zhang Xiang X

Journal of chromatography. A 20120619

Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the re ...[more]

PMID: 22771253

Dataset Information

A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures.

Publications

A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Comparative analysis of mass spectral similarity measures on peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry.
| S-EPMC3787630 | biostudies-literature

Endocrine Sensitivity Index Validation Dataset
2010-08-13 | GSE17705 | GEO

Measures of Neural Similarity.
| S-EPMC7671987 | biostudies-literature

A medoid-based deviation ratio index to determine the number of clusters in a dataset.
| S-EPMC10011427 | biostudies-literature

Endocrine Sensitivity Index Validation Dataset
2010-08-13 | E-GEOD-17705 | biostudies-arrayexpress

Variation in multicomponent recognition cues alters egg rejection decisions: a test of the optimal acceptance threshold hypothesis.
| S-EPMC6388043 | biostudies-literature

Spectral clustering based on learning similarity matrix.
| S-EPMC6454479 | biostudies-literature

Similarity measures for protein ensembles.
| S-EPMC2615214 | biostudies-literature

Clustering biological sequences with dynamic sequence similarity threshold.
| S-EPMC8969259 | biostudies-literature

A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families.
| S-EPMC4433014 | biostudies-literature