Optimized De Novo Molecular Generation (OMG) for Mass Spectra Annotation Using Transfer and Reinforcement Learning

IF 6.7 1区 化学 Q1 CHEMISTRY, ANALYTICAL
Margaret R. Martin,  and , Soha Hassoun*, 
{"title":"Optimized De Novo Molecular Generation (OMG) for Mass Spectra Annotation Using Transfer and Reinforcement Learning","authors":"Margaret R. Martin,&nbsp; and ,&nbsp;Soha Hassoun*,&nbsp;","doi":"10.1021/acs.analchem.5c01770","DOIUrl":null,"url":null,"abstract":"<p >Despite the size increase in spectral reference libraries and available annotation tools, the rate of assigning molecular structures to tandem mass spectra remains low. As not all chemical products are known nor cataloged in databases, generative AI models are poised to address this gap through de novo structural candidate generation. We develop a novel method, Optimized Molecular Generation (OMG), for de novo molecular generation for mass spectra annotation. OMG comprises two steps: molecular generation and candidate ranking. During molecular generation, we finetune a prior unbiased molecular generation model using transfer learning on molecules retrieved from PubChem based on a target molecular formula. Using reinforcement learning, we utilize custom scoring functions to create a curriculum-learning scheme that guides the generation of novel molecular candidates for a queried spectrum. After sampling the finetuned model, we rank the generated candidate structures. OMG finetunes REINVENT4’s pretrained molecular generator and ranks generated molecules using two recent ranking models, JESTR and ESP. We evaluate OMG on the CANOPUS and MassSpecGym data sets, for which OMG achieves 10.51 and 2.42% for top-1 accuracy, respectively, therefore outperforming current baselines. Our work highlights the promise of utilizing transfer and reinforcement learning in guiding de novo generation for spectra annotation.</p>","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"97 38","pages":"20734–20742"},"PeriodicalIF":6.7000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.analchem.5c01770","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the size increase in spectral reference libraries and available annotation tools, the rate of assigning molecular structures to tandem mass spectra remains low. As not all chemical products are known nor cataloged in databases, generative AI models are poised to address this gap through de novo structural candidate generation. We develop a novel method, Optimized Molecular Generation (OMG), for de novo molecular generation for mass spectra annotation. OMG comprises two steps: molecular generation and candidate ranking. During molecular generation, we finetune a prior unbiased molecular generation model using transfer learning on molecules retrieved from PubChem based on a target molecular formula. Using reinforcement learning, we utilize custom scoring functions to create a curriculum-learning scheme that guides the generation of novel molecular candidates for a queried spectrum. After sampling the finetuned model, we rank the generated candidate structures. OMG finetunes REINVENT4’s pretrained molecular generator and ranks generated molecules using two recent ranking models, JESTR and ESP. We evaluate OMG on the CANOPUS and MassSpecGym data sets, for which OMG achieves 10.51 and 2.42% for top-1 accuracy, respectively, therefore outperforming current baselines. Our work highlights the promise of utilizing transfer and reinforcement learning in guiding de novo generation for spectra annotation.

Abstract Image

基于迁移和强化学习的优化分子生成(OMG)的质谱标注。
尽管光谱参考库和可用的注释工具的规模有所增加,但将分子结构分配给串联质谱的比率仍然很低。由于并非所有的化学产品都已知或在数据库中编目,生成式人工智能模型准备通过从头生成结构候选产品来解决这一差距。我们开发了一种新的方法,优化分子生成(OMG),用于质谱注释的从头分子生成。OMG包括两个步骤:分子生成和候选排序。在分子生成过程中,我们基于目标分子式,使用迁移学习对PubChem中检索的分子进行了先验无偏分子生成模型的微调。使用强化学习,我们利用自定义评分函数来创建课程学习方案,该方案指导为查询光谱生成新的候选分子。在对调整好的模型进行采样后,我们对生成的候选结构进行排序。OMG对REINVENT4的预训练分子生成器进行微调,并使用两种最新的排名模型(JESTR和ESP)对生成的分子进行排名。我们在CANOPUS和MassSpecGym数据集上对OMG进行了评估,OMG在前1名的准确率分别达到了10.51%和2.42%,因此优于当前的基线。我们的工作强调了利用迁移和强化学习指导谱注释从头生成的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Analytical Chemistry
Analytical Chemistry 化学-分析化学
CiteScore
12.10
自引率
12.20%
发文量
1949
审稿时长
1.4 months
期刊介绍: Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信