Regio-MPNN: predicting regioselectivity for general metal-catalyzed cross-coupling reactions using a chemical knowledge informed message passing neural network†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Baochen Li, Yuru Liu, Haibin Sun, Rentao Zhang, Yongli Xie, Klement Foo, Frankie S. Mak, Ruimao Zhang, Tianshu Yu, Sen Lin, Peng Wang and Xiaoxue Wang
{"title":"Regio-MPNN: predicting regioselectivity for general metal-catalyzed cross-coupling reactions using a chemical knowledge informed message passing neural network†","authors":"Baochen Li, Yuru Liu, Haibin Sun, Rentao Zhang, Yongli Xie, Klement Foo, Frankie S. Mak, Ruimao Zhang, Tianshu Yu, Sen Lin, Peng Wang and Xiaoxue Wang","doi":"10.1039/D4DD00244J","DOIUrl":null,"url":null,"abstract":"<p >As a fundamental problem in organic chemistry, synthesis planning aims at designing energy and cost-efficient reaction pathways for target compounds. In synthesis planning, it is crucial to understand regioselectivity, or the preference of a reaction over competing reaction sites. Precisely predicting regioselectivity enables early exclusion of unproductive reactions and paves the way to designing high-yielding synthetic routes with minimal separation and material costs. However, it is still at the emerging state to combine chemical knowledge and data-driven methods to make practical predictions for regioselectivity. At the same time, metal-catalyzed cross-coupling reactions have profoundly transformed medicinal chemistry, and thus become one of the most frequently encountered types of reactions in synthesis planning. In this work, we for the first time introduce a chemical knowledge informed message passing neural network (MPNN) framework that directly identifies the intrinsic major products for metal-catalyzed cross-coupling reactions with regioselective ambiguity. Integrating both first principles methods and data-driven methods, our model achieves an overall accuracy of 96.51% on the test set of eight typical metal-catalyzed cross-coupling reaction types, including Suzuki–Miyaura, Stille, Sonogashira, Buchwald–Hartwig, Hiyama, Kumada, Negishi, and Heck reactions, outperforming other commonly used model types. To integrate electronic effects with steric effects in regioselectivity prediction, we propose a quantitative method to measure the steric hindrance effect. Our steric hindrance checker can successfully identify regioselectivity induced solely by steric hindrance. Notably under practical scenarios, our model outperforms 6 experimental organic chemists with an average working experience of over 10 years in the organic synthesis industry in terms of predicting major products in regioselective cases. We have also exemplified the practical usage of our model by fixing routes designed by open-access synthesis planning software and improving reactions by identifying low-cost starting materials. To assist general chemists in making prompt decisions about regioselectivity, we have developed a free web-based AI-empowered tool. Our code and web tool have been made available at https://github.com/Chemlex-AI/regioselectivity and https://ai.tools.chemlex.com/region-choose, respectively.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2019-2031"},"PeriodicalIF":6.2000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00244j?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00244j","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

As a fundamental problem in organic chemistry, synthesis planning aims at designing energy and cost-efficient reaction pathways for target compounds. In synthesis planning, it is crucial to understand regioselectivity, or the preference of a reaction over competing reaction sites. Precisely predicting regioselectivity enables early exclusion of unproductive reactions and paves the way to designing high-yielding synthetic routes with minimal separation and material costs. However, it is still at the emerging state to combine chemical knowledge and data-driven methods to make practical predictions for regioselectivity. At the same time, metal-catalyzed cross-coupling reactions have profoundly transformed medicinal chemistry, and thus become one of the most frequently encountered types of reactions in synthesis planning. In this work, we for the first time introduce a chemical knowledge informed message passing neural network (MPNN) framework that directly identifies the intrinsic major products for metal-catalyzed cross-coupling reactions with regioselective ambiguity. Integrating both first principles methods and data-driven methods, our model achieves an overall accuracy of 96.51% on the test set of eight typical metal-catalyzed cross-coupling reaction types, including Suzuki–Miyaura, Stille, Sonogashira, Buchwald–Hartwig, Hiyama, Kumada, Negishi, and Heck reactions, outperforming other commonly used model types. To integrate electronic effects with steric effects in regioselectivity prediction, we propose a quantitative method to measure the steric hindrance effect. Our steric hindrance checker can successfully identify regioselectivity induced solely by steric hindrance. Notably under practical scenarios, our model outperforms 6 experimental organic chemists with an average working experience of over 10 years in the organic synthesis industry in terms of predicting major products in regioselective cases. We have also exemplified the practical usage of our model by fixing routes designed by open-access synthesis planning software and improving reactions by identifying low-cost starting materials. To assist general chemists in making prompt decisions about regioselectivity, we have developed a free web-based AI-empowered tool. Our code and web tool have been made available at https://github.com/Chemlex-AI/regioselectivity and https://ai.tools.chemlex.com/region-choose, respectively.

Abstract Image

Abstract Image

Regio-MPNN:利用化学知识信息传递神经网络预测一般金属催化交叉偶联反应的区域选择性
作为有机化学中的一个基本问题,合成规划旨在为目标化合物设计具有能源和成本效益的反应途径。在合成规划中,了解反应的区域选择性或反应对竞争反应位点的偏好至关重要。精确预测区域选择性可以及早排除非生产性反应,并为设计分离成本和材料成本最低的高产合成路线铺平道路。然而,如何将化学知识和数据驱动方法结合起来,对区域选择性进行实用预测,目前仍处于新兴阶段。与此同时,金属催化的交叉偶联反应深刻地改变了药物化学,并因此成为合成规划中最常遇到的反应类型之一。在这项工作中,我们首次引入了一种基于化学知识的消息传递神经网络(MPNN)框架,该框架可直接识别具有区域选择性模糊性的金属催化交叉偶联反应的内在主要产物。我们的模型综合了第一性原理方法和数据驱动方法,在八种典型金属催化交叉偶联反应类型(包括铃木-宫浦反应、斯蒂尔反应、园平反应、布赫瓦尔德-哈特维希反应、日山反应、熊田反应、根岸反应和赫克反应)的测试集上,总体准确率达到 96.51%,优于其他常用模型类型。为了在区域选择性预测中整合电子效应和立体效应,我们提出了一种测量立体阻碍效应的定量方法。我们的立体阻碍检查器能成功识别仅由立体阻碍引起的区域选择性。值得注意的是,在实际情况下,我们的模型在预测区域选择性情况下的主要产物方面优于 6 位平均工作经验超过 10 年的有机合成实验化学家。我们还通过修正由开放式合成规划软件设计的路线,以及通过识别低成本起始材料来改进反应,举例说明了我们模型的实际用途。为了帮助普通化学家迅速做出有关区域选择性的决策,我们开发了一个基于人工智能的免费网络工具。我们的代码和网络工具已分别发布在 https://github.com/Chemlex-AI/regioselectivity 和 https://ai.tools.chemlex.com/region-choose 上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信