Emilio Dorigatti, Jonathan Groß, Jonas Kühlborn, Robert Möckel, Frank Maier and Julian Keupp
{"title":"通过迁移学习和领域知识增强串联质谱中药物杂质结构的自动解析","authors":"Emilio Dorigatti, Jonathan Groß, Jonas Kühlborn, Robert Möckel, Frank Maier and Julian Keupp","doi":"10.1039/D5DD00115C","DOIUrl":null,"url":null,"abstract":"<p >Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an essential analytical technique in the pharmaceutical industry, used particularly for elucidating the structure of unknown impurities in the synthesis of active pharmaceutical ingredients. However, the interpretation of mass spectra is challenging and time-consuming, requiring significant expertise. While recent computational tools aimed at automating this process have been developed, their accuracy in determining the chemical structure limits its use in practice. In this paper, we introduce a new method called SEISMiQ for elucidating unknown impurities from their MS/MS spectra. We are able to significantly improve elucidation accuracy by integrating domain experts' knowledge, specifically the impurity sum formula and known substructure, into the model's training and inference process. Further performance improvements can be achieved through transfer learning using simulated MS/MS spectra of impurities from an in-house database. Finally, the need for any experimental data collection for finetuning can be circumvented by simulating the entire drug substance synthesis process <em>in silico via</em> reaction templates.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2454-2464"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00115c?page=search","citationCount":"0","resultStr":"{\"title\":\"Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge†\",\"authors\":\"Emilio Dorigatti, Jonathan Groß, Jonas Kühlborn, Robert Möckel, Frank Maier and Julian Keupp\",\"doi\":\"10.1039/D5DD00115C\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an essential analytical technique in the pharmaceutical industry, used particularly for elucidating the structure of unknown impurities in the synthesis of active pharmaceutical ingredients. However, the interpretation of mass spectra is challenging and time-consuming, requiring significant expertise. While recent computational tools aimed at automating this process have been developed, their accuracy in determining the chemical structure limits its use in practice. In this paper, we introduce a new method called SEISMiQ for elucidating unknown impurities from their MS/MS spectra. We are able to significantly improve elucidation accuracy by integrating domain experts' knowledge, specifically the impurity sum formula and known substructure, into the model's training and inference process. Further performance improvements can be achieved through transfer learning using simulated MS/MS spectra of impurities from an in-house database. Finally, the need for any experimental data collection for finetuning can be circumvented by simulating the entire drug substance synthesis process <em>in silico via</em> reaction templates.</p>\",\"PeriodicalId\":72816,\"journal\":{\"name\":\"Digital discovery\",\"volume\":\" 9\",\"pages\":\" 2454-2464\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00115c?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00115c\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00115c","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge†
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an essential analytical technique in the pharmaceutical industry, used particularly for elucidating the structure of unknown impurities in the synthesis of active pharmaceutical ingredients. However, the interpretation of mass spectra is challenging and time-consuming, requiring significant expertise. While recent computational tools aimed at automating this process have been developed, their accuracy in determining the chemical structure limits its use in practice. In this paper, we introduce a new method called SEISMiQ for elucidating unknown impurities from their MS/MS spectra. We are able to significantly improve elucidation accuracy by integrating domain experts' knowledge, specifically the impurity sum formula and known substructure, into the model's training and inference process. Further performance improvements can be achieved through transfer learning using simulated MS/MS spectra of impurities from an in-house database. Finally, the need for any experimental data collection for finetuning can be circumvented by simulating the entire drug substance synthesis process in silico via reaction templates.