Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Emilio Dorigatti, Jonathan Groß, Jonas Kühlborn, Robert Möckel, Frank Maier and Julian Keupp
{"title":"Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge†","authors":"Emilio Dorigatti, Jonathan Groß, Jonas Kühlborn, Robert Möckel, Frank Maier and Julian Keupp","doi":"10.1039/D5DD00115C","DOIUrl":null,"url":null,"abstract":"<p >Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an essential analytical technique in the pharmaceutical industry, used particularly for elucidating the structure of unknown impurities in the synthesis of active pharmaceutical ingredients. However, the interpretation of mass spectra is challenging and time-consuming, requiring significant expertise. While recent computational tools aimed at automating this process have been developed, their accuracy in determining the chemical structure limits its use in practice. In this paper, we introduce a new method called SEISMiQ for elucidating unknown impurities from their MS/MS spectra. We are able to significantly improve elucidation accuracy by integrating domain experts' knowledge, specifically the impurity sum formula and known substructure, into the model's training and inference process. Further performance improvements can be achieved through transfer learning using simulated MS/MS spectra of impurities from an in-house database. Finally, the need for any experimental data collection for finetuning can be circumvented by simulating the entire drug substance synthesis process <em>in silico via</em> reaction templates.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2454-2464"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00115c?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00115c","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an essential analytical technique in the pharmaceutical industry, used particularly for elucidating the structure of unknown impurities in the synthesis of active pharmaceutical ingredients. However, the interpretation of mass spectra is challenging and time-consuming, requiring significant expertise. While recent computational tools aimed at automating this process have been developed, their accuracy in determining the chemical structure limits its use in practice. In this paper, we introduce a new method called SEISMiQ for elucidating unknown impurities from their MS/MS spectra. We are able to significantly improve elucidation accuracy by integrating domain experts' knowledge, specifically the impurity sum formula and known substructure, into the model's training and inference process. Further performance improvements can be achieved through transfer learning using simulated MS/MS spectra of impurities from an in-house database. Finally, the need for any experimental data collection for finetuning can be circumvented by simulating the entire drug substance synthesis process in silico via reaction templates.

Abstract Image

通过迁移学习和领域知识增强串联质谱中药物杂质结构的自动解析
液相色谱-串联质谱(LC-MS/MS)是制药工业中必不可少的分析技术,特别是用于阐明活性药物成分合成中未知杂质的结构。然而,质谱的解释是具有挑战性和耗时的,需要大量的专业知识。虽然最近开发了旨在使这一过程自动化的计算工具,但它们在确定化学结构方面的准确性限制了其在实践中的应用。本文介绍了一种名为SEISMiQ的新方法,用于从质谱/质谱中解析未知杂质。通过将领域专家的知识,特别是杂质和公式和已知子结构集成到模型的训练和推理过程中,我们能够显著提高阐明精度。进一步的性能改进可以通过使用内部数据库中杂质的模拟MS/MS谱的迁移学习来实现。最后,通过反应模板模拟整个原料药合成过程,可以避免需要收集任何实验数据进行微调。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信