Reaction Impurity Prediction using a Data Mining Approach**

IF 6.1 Q1 CHEMISTRY, MULTIDISCIPLINARY
Adarsh Arun, Dr. Zhen Guo, Dr. Simon Sung, Prof. Alexei A. Lapkin
{"title":"Reaction Impurity Prediction using a Data Mining Approach**","authors":"Adarsh Arun,&nbsp;Dr. Zhen Guo,&nbsp;Dr. Simon Sung,&nbsp;Prof. Alexei A. Lapkin","doi":"10.1002/cmtd.202200062","DOIUrl":null,"url":null,"abstract":"<p>Automated prediction of reaction impurities is useful in early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards <i>main</i> product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work aims to present an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies (paracetamol, agomelatine and lersivirine) were conducted, with the workflow correctly suggesting impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions.</p>","PeriodicalId":72562,"journal":{"name":"Chemistry methods : new approaches to solving problems in chemistry","volume":null,"pages":null},"PeriodicalIF":6.1000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cmtd.202200062","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry methods : new approaches to solving problems in chemistry","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cmtd.202200062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 1

Abstract

Automated prediction of reaction impurities is useful in early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards main product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work aims to present an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies (paracetamol, agomelatine and lersivirine) were conducted, with the workflow correctly suggesting impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions.

Abstract Image

用数据挖掘方法预测反应杂质**
反应杂质的自动预测在早期反应开发、合成规划和优化中是有用的。现有的反应预测因子是针对主要产品预测的,并且通常是黑匣子,因此很难排除错误结果。这项工作旨在提出一种基于数据挖掘的大型化学反应数据库的自动化、可解释的杂质预测工作流程。使用Reaxys®数据在Python和RDKit中实现了14步工作流程。通过直接挖掘Reaxys®数据库中涉及这些官能团的类似或“类似”反应,可以准确评估用户提供的查询物种中相同反应环境中存在的官能团之间的潜在化学反应。然后可以从类似反应中提取反应模板,并将其应用于原始查询中的相关物种,以返回感兴趣的杂质和转化。进行了三项概念验证案例研究(扑热息痛、阿戈美拉汀和乐西韦林),工作流程正确地表明前两项结果中存在杂质。在所有阶段,建议的杂质都可以追溯到文献中的原始模板和类似物反应,以便进行更仔细的检查和用户验证。最终,这项工作可以作为更复杂算法或模型的基准,因为它是可解释的,而不是纯粹的黑盒解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信