{"title":"Optimized De Novo Molecular Generation (OMG) for Mass Spectra Annotation Using Transfer and Reinforcement Learning","authors":"Margaret R. Martin, and , Soha Hassoun*, ","doi":"10.1021/acs.analchem.5c01770","DOIUrl":null,"url":null,"abstract":"<p >Despite the size increase in spectral reference libraries and available annotation tools, the rate of assigning molecular structures to tandem mass spectra remains low. As not all chemical products are known nor cataloged in databases, generative AI models are poised to address this gap through de novo structural candidate generation. We develop a novel method, Optimized Molecular Generation (OMG), for de novo molecular generation for mass spectra annotation. OMG comprises two steps: molecular generation and candidate ranking. During molecular generation, we finetune a prior unbiased molecular generation model using transfer learning on molecules retrieved from PubChem based on a target molecular formula. Using reinforcement learning, we utilize custom scoring functions to create a curriculum-learning scheme that guides the generation of novel molecular candidates for a queried spectrum. After sampling the finetuned model, we rank the generated candidate structures. OMG finetunes REINVENT4’s pretrained molecular generator and ranks generated molecules using two recent ranking models, JESTR and ESP. We evaluate OMG on the CANOPUS and MassSpecGym data sets, for which OMG achieves 10.51 and 2.42% for top-1 accuracy, respectively, therefore outperforming current baselines. Our work highlights the promise of utilizing transfer and reinforcement learning in guiding de novo generation for spectra annotation.</p>","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"97 38","pages":"20734–20742"},"PeriodicalIF":6.7000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.analchem.5c01770","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the size increase in spectral reference libraries and available annotation tools, the rate of assigning molecular structures to tandem mass spectra remains low. As not all chemical products are known nor cataloged in databases, generative AI models are poised to address this gap through de novo structural candidate generation. We develop a novel method, Optimized Molecular Generation (OMG), for de novo molecular generation for mass spectra annotation. OMG comprises two steps: molecular generation and candidate ranking. During molecular generation, we finetune a prior unbiased molecular generation model using transfer learning on molecules retrieved from PubChem based on a target molecular formula. Using reinforcement learning, we utilize custom scoring functions to create a curriculum-learning scheme that guides the generation of novel molecular candidates for a queried spectrum. After sampling the finetuned model, we rank the generated candidate structures. OMG finetunes REINVENT4’s pretrained molecular generator and ranks generated molecules using two recent ranking models, JESTR and ESP. We evaluate OMG on the CANOPUS and MassSpecGym data sets, for which OMG achieves 10.51 and 2.42% for top-1 accuracy, respectively, therefore outperforming current baselines. Our work highlights the promise of utilizing transfer and reinforcement learning in guiding de novo generation for spectra annotation.
期刊介绍:
Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.