{"title":"基于质谱代谢组学的化合物鉴定连续相似度量的比较分析","authors":"Hunter Dlugas , Xiang Zhang , Seongho Kim","doi":"10.1016/j.chemolab.2025.105417","DOIUrl":null,"url":null,"abstract":"<div><div>In mass spectrometry (MS)-based metabolomics, the most straightforward and efficient approach for compound identification is the comparison of similarity scores between experimental spectra and reference spectra. Among various single and composite similarity measures, the Cosine Correlation is favored due to its simplicity, efficiency, and effectiveness. Recently, the Shannon Entropy Correlation has shown superior performance over several other measures, including the Cosine Correlation, in LC-MS-based metabolomics, particularly concerning receiver operating characteristic (ROC) curves and false discovery rates. However, previous comparisons did not consider the weight factor transformation, which is critical for achieving higher accuracy with the cosine correlation. This study conducted a comparative analysis of the Cosine Correlation and Shannon Entropy Correlation, incorporating the weight factor transformation during preprocessing. Additionally, we developed a novel entropy correlation measure, the Tsallis Entropy Correlation, which offers greater versatility than the Shannon Entropy Correlation. Our accuracy-based results indicate that the weight factor transformation is essential for achieving higher identification performance in both LC-MS and GC-MS-based compound identification. Although the Tsallis Entropy Correlation outperforms the Shannon Entropy Correlation in terms of accuracy, it comes with higher computational expense. In contrast, the Cosine Correlation, when combined with the weight factor transformation, achieves the highest accuracy and the lowest computational expense, demonstrating both robustness and efficiency in MS-based compound identification.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"263 ","pages":"Article 105417"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of continuous similarity measures for compound identification in mass spectrometry-based metabolomics\",\"authors\":\"Hunter Dlugas , Xiang Zhang , Seongho Kim\",\"doi\":\"10.1016/j.chemolab.2025.105417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In mass spectrometry (MS)-based metabolomics, the most straightforward and efficient approach for compound identification is the comparison of similarity scores between experimental spectra and reference spectra. Among various single and composite similarity measures, the Cosine Correlation is favored due to its simplicity, efficiency, and effectiveness. Recently, the Shannon Entropy Correlation has shown superior performance over several other measures, including the Cosine Correlation, in LC-MS-based metabolomics, particularly concerning receiver operating characteristic (ROC) curves and false discovery rates. However, previous comparisons did not consider the weight factor transformation, which is critical for achieving higher accuracy with the cosine correlation. This study conducted a comparative analysis of the Cosine Correlation and Shannon Entropy Correlation, incorporating the weight factor transformation during preprocessing. Additionally, we developed a novel entropy correlation measure, the Tsallis Entropy Correlation, which offers greater versatility than the Shannon Entropy Correlation. Our accuracy-based results indicate that the weight factor transformation is essential for achieving higher identification performance in both LC-MS and GC-MS-based compound identification. Although the Tsallis Entropy Correlation outperforms the Shannon Entropy Correlation in terms of accuracy, it comes with higher computational expense. In contrast, the Cosine Correlation, when combined with the weight factor transformation, achieves the highest accuracy and the lowest computational expense, demonstrating both robustness and efficiency in MS-based compound identification.</div></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"263 \",\"pages\":\"Article 105417\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743925001029\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001029","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Comparative analysis of continuous similarity measures for compound identification in mass spectrometry-based metabolomics
In mass spectrometry (MS)-based metabolomics, the most straightforward and efficient approach for compound identification is the comparison of similarity scores between experimental spectra and reference spectra. Among various single and composite similarity measures, the Cosine Correlation is favored due to its simplicity, efficiency, and effectiveness. Recently, the Shannon Entropy Correlation has shown superior performance over several other measures, including the Cosine Correlation, in LC-MS-based metabolomics, particularly concerning receiver operating characteristic (ROC) curves and false discovery rates. However, previous comparisons did not consider the weight factor transformation, which is critical for achieving higher accuracy with the cosine correlation. This study conducted a comparative analysis of the Cosine Correlation and Shannon Entropy Correlation, incorporating the weight factor transformation during preprocessing. Additionally, we developed a novel entropy correlation measure, the Tsallis Entropy Correlation, which offers greater versatility than the Shannon Entropy Correlation. Our accuracy-based results indicate that the weight factor transformation is essential for achieving higher identification performance in both LC-MS and GC-MS-based compound identification. Although the Tsallis Entropy Correlation outperforms the Shannon Entropy Correlation in terms of accuracy, it comes with higher computational expense. In contrast, the Cosine Correlation, when combined with the weight factor transformation, achieves the highest accuracy and the lowest computational expense, demonstrating both robustness and efficiency in MS-based compound identification.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.