The Power of Many: An Ensemble Approach to Spectral Similarity

IF 2.7 2区 化学 Q2 BIOCHEMICAL RESEARCH METHODS
Javier E. Flores, , , David J. Degnan, , , Yuri E. Corilo, , , Chaevien S. Clendinen, , and , Lisa M. Bramer*, 
{"title":"The Power of Many: An Ensemble Approach to Spectral Similarity","authors":"Javier E. Flores,&nbsp;, ,&nbsp;David J. Degnan,&nbsp;, ,&nbsp;Yuri E. Corilo,&nbsp;, ,&nbsp;Chaevien S. Clendinen,&nbsp;, and ,&nbsp;Lisa M. Bramer*,&nbsp;","doi":"10.1021/jasms.5c00176","DOIUrl":null,"url":null,"abstract":"<p >Quantifying the similarity between two mass spectra─a known reference mass spectrum and an unidentified sample mass spectrum─is at the heart of compound identification workflows in gas chromatography–mass spectrometry (GC-MS). The reference spectrum most like the sample is assigned as its identification (provided some quantitative similarity threshold is met, e.g., 80%) and thus accurately measuring similarity is essential. Significant research has gone toward developing metrics for this purpose, each of which has attempted to improve upon existing methods by incorporating GC-MS-specific information (e.g., peak ratios or retention times) or adopting various statistical and algorithmic frameworks. While this active development has led to a plethora of similarity metrics with demonstrated value across different contexts, the unfortunate consequence has been confusion surrounding which metric should be used as a global standard. No such metric is currently accepted as the standard method because different metrics have demonstrated optimal performance in different contexts. In this work, we propose an ensemble approach to spectral similarity scoring that combines the collective information from across existing similarity metrics to form an improved, globally representative similarity metric as a step toward establishing a global standard method. The resulting ensemble metrics are evaluated on over 88,000 spectra of varying complexity and demonstrate improved abilities to accurately rank the correct reference spectrum as the top-matching candidate for a sample relative to the rankings generated by individual similarity scores.</p>","PeriodicalId":672,"journal":{"name":"Journal of the American Society for Mass Spectrometry","volume":"36 10","pages":"2164–2170"},"PeriodicalIF":2.7000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Mass Spectrometry","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/jasms.5c00176","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Quantifying the similarity between two mass spectra─a known reference mass spectrum and an unidentified sample mass spectrum─is at the heart of compound identification workflows in gas chromatography–mass spectrometry (GC-MS). The reference spectrum most like the sample is assigned as its identification (provided some quantitative similarity threshold is met, e.g., 80%) and thus accurately measuring similarity is essential. Significant research has gone toward developing metrics for this purpose, each of which has attempted to improve upon existing methods by incorporating GC-MS-specific information (e.g., peak ratios or retention times) or adopting various statistical and algorithmic frameworks. While this active development has led to a plethora of similarity metrics with demonstrated value across different contexts, the unfortunate consequence has been confusion surrounding which metric should be used as a global standard. No such metric is currently accepted as the standard method because different metrics have demonstrated optimal performance in different contexts. In this work, we propose an ensemble approach to spectral similarity scoring that combines the collective information from across existing similarity metrics to form an improved, globally representative similarity metric as a step toward establishing a global standard method. The resulting ensemble metrics are evaluated on over 88,000 spectra of varying complexity and demonstrate improved abilities to accurately rank the correct reference spectrum as the top-matching candidate for a sample relative to the rankings generated by individual similarity scores.

Abstract Image

多的力量:光谱相似度的集成方法。
定量测定两种质谱(已知参比质谱和未知样品质谱)之间的相似性是气相色谱-质谱联用(GC-MS)中化合物鉴定工作流程的核心。指定与样品最相似的参考光谱作为其识别(前提是满足某些定量相似性阈值,例如80%),因此准确测量相似性至关重要。为了实现这一目的,已经进行了大量的研究,每个研究都试图通过结合gc - ms特定的信息(例如,峰值比率或保留时间)或采用各种统计和算法框架来改进现有的方法。虽然这种积极的发展导致了在不同环境中具有证明价值的相似性度量的过剩,但不幸的结果是围绕应该使用哪个度量作为全球标准的混乱。目前没有这样的度量标准被接受为标准方法,因为不同的度量标准在不同的上下文中展示了最佳性能。在这项工作中,我们提出了一种光谱相似度评分的集成方法,该方法结合了来自现有相似度度量的集体信息,形成了一种改进的、具有全球代表性的相似度度量,作为建立全球标准方法的一步。由此产生的集成指标在超过88,000个不同复杂性的光谱上进行评估,并证明了相对于由个体相似性得分生成的排名,准确地将正确的参考光谱作为样本的最匹配候选光谱进行排名的能力得到了提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.50
自引率
9.40%
发文量
257
审稿时长
1 months
期刊介绍: The Journal of the American Society for Mass Spectrometry presents research papers covering all aspects of mass spectrometry, incorporating coverage of fields of scientific inquiry in which mass spectrometry can play a role. Comprehensive in scope, the journal publishes papers on both fundamentals and applications of mass spectrometry. Fundamental subjects include instrumentation principles, design, and demonstration, structures and chemical properties of gas-phase ions, studies of thermodynamic properties, ion spectroscopy, chemical kinetics, mechanisms of ionization, theories of ion fragmentation, cluster ions, and potential energy surfaces. In addition to full papers, the journal offers Communications, Application Notes, and Accounts and Perspectives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信