TY - JOUR
T1 - A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures
AU - Zhang, Jun
AU - Koo, Imhoi
AU - Wang, Bing
AU - Gao, Qing Wei
AU - Zheng, Chun Hou
AU - Zhang, Xiang
N1 - Funding Information:
This work was supported by Grant 1RO1GM087735 from the National Institute of General Medical Sciences (NIGMS) within the National Institutes of Health (NIH) , National Natural Science Foundation of China under grant nos. 61032007 and 60803107 , and Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province under grant no. KJ2012A005 .
PY - 2012/8/17
Y1 - 2012/8/17
N2 - Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15. i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.
AB - Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15. i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.
UR - http://www.scopus.com/inward/record.url?scp=84864062516&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864062516&partnerID=8YFLogxK
U2 - 10.1016/j.chroma.2012.06.036
DO - 10.1016/j.chroma.2012.06.036
M3 - Article
C2 - 22771253
AN - SCOPUS:84864062516
SN - 0021-9673
VL - 1251
SP - 188
EP - 193
JO - Journal of Chromatography A
JF - Journal of Chromatography A
ER -