A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures

Jun Zhang, Imhoi Koo, Bing Wang, Qing Wei Gao, Chun Hou Zheng, Xiang Zhang

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15. i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.

Original languageEnglish (US)
Pages (from-to)188-193
Number of pages6
JournalJournal of Chromatography A
Volume1251
DOIs
StatePublished - Aug 17 2012

All Science Journal Classification (ASJC) codes

  • Analytical Chemistry
  • Biochemistry
  • Organic Chemistry

Fingerprint

Dive into the research topics of 'A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures'. Together they form a unique fingerprint.

Cite this