分子机器学习方法对映选择性C-H键激活反应:从生成人工智能到实验验证

IF 7.6 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Ajnabiul Hoque, Taiwei Chang, Jin-Quan Yu, Raghavan B. Sunoj
{"title":"分子机器学习方法对映选择性C-H键激活反应:从生成人工智能到实验验证","authors":"Ajnabiul Hoque, Taiwei Chang, Jin-Quan Yu, Raghavan B. Sunoj","doi":"10.1039/d5sc01098e","DOIUrl":null,"url":null,"abstract":"Molecular machine learning (ML) has gained considerable attention in recent years. Developing ML algorithms for chemical reaction prediction is a formidable task, due to the small-sized reaction data it often presents, besides the sparsity and skewed distribution. While previous ML studies offered effective predictions on known reactions, efforts in using deep generative models for guiding new reactions and their prospective validation are rare. We harness both predictive and explorative abilities of deep learning on an important catalytic asymmetric β-C(sp3)–H activation reaction, consisting of 220 experimentally reported examples that differs primarily in terms of the substrate, catalyst, and coupling partner. A transfer learning approach using a chemical language model, pretrained on 1 million unlabeled molecules followed by fine-tuning on this reaction data set, is adopted. Our ensemble prediction (EnP) model, where 30 fine-tuned CLMs concurrently predict the %ee of test set reactions, is highly reliable. Another language model, fine-tuned on the 77 known chiral ligands as used in the above reactions, is employed for generating novel ligands of high validity and novelty. A proof of concept wet-lab experimental validation reveals that most of the ML-generated reactions are in excellent agreement with the EnP predictions. Results also caution the prospects of ML-driven reaction development for ligand design and emphasize the importance of domain experts in key decisions.","PeriodicalId":9909,"journal":{"name":"Chemical Science","volume":"458 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Molecular Machine Learning Approach to Enantioselective C–H Bond Activation Reactions: From Generative AI to Experimental Validation\",\"authors\":\"Ajnabiul Hoque, Taiwei Chang, Jin-Quan Yu, Raghavan B. Sunoj\",\"doi\":\"10.1039/d5sc01098e\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Molecular machine learning (ML) has gained considerable attention in recent years. Developing ML algorithms for chemical reaction prediction is a formidable task, due to the small-sized reaction data it often presents, besides the sparsity and skewed distribution. While previous ML studies offered effective predictions on known reactions, efforts in using deep generative models for guiding new reactions and their prospective validation are rare. We harness both predictive and explorative abilities of deep learning on an important catalytic asymmetric β-C(sp3)–H activation reaction, consisting of 220 experimentally reported examples that differs primarily in terms of the substrate, catalyst, and coupling partner. A transfer learning approach using a chemical language model, pretrained on 1 million unlabeled molecules followed by fine-tuning on this reaction data set, is adopted. Our ensemble prediction (EnP) model, where 30 fine-tuned CLMs concurrently predict the %ee of test set reactions, is highly reliable. Another language model, fine-tuned on the 77 known chiral ligands as used in the above reactions, is employed for generating novel ligands of high validity and novelty. A proof of concept wet-lab experimental validation reveals that most of the ML-generated reactions are in excellent agreement with the EnP predictions. Results also caution the prospects of ML-driven reaction development for ligand design and emphasize the importance of domain experts in key decisions.\",\"PeriodicalId\":9909,\"journal\":{\"name\":\"Chemical Science\",\"volume\":\"458 1\",\"pages\":\"\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Science\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1039/d5sc01098e\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Science","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d5sc01098e","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

近年来,分子机器学习(ML)得到了广泛的关注。开发用于化学反应预测的ML算法是一项艰巨的任务,因为它经常呈现小尺寸的反应数据,除了稀疏性和偏态分布。虽然以前的机器学习研究提供了对已知反应的有效预测,但使用深度生成模型来指导新反应及其前瞻性验证的努力很少。我们在一个重要的催化不对称β-C(sp3) -H活化反应上利用深度学习的预测和探索能力,包括220个实验报告的例子,这些例子主要在底物、催化剂和偶联伙伴方面有所不同。采用化学语言模型的迁移学习方法,在100万个未标记分子上进行预训练,然后对该反应数据集进行微调。我们的集合预测(EnP)模型是高度可靠的,其中30个微调的clm同时预测测试集反应的百分比ee。另一种语言模型对上述反应中使用的77种已知手性配体进行了微调,用于生成高效度和新颖性的新型配体。概念验证湿实验室实验验证表明,大多数ml生成的反应与EnP预测非常一致。结果也提醒了机器学习驱动的配体设计反应发展的前景,并强调了领域专家在关键决策中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Molecular Machine Learning Approach to Enantioselective C–H Bond Activation Reactions: From Generative AI to Experimental Validation
Molecular machine learning (ML) has gained considerable attention in recent years. Developing ML algorithms for chemical reaction prediction is a formidable task, due to the small-sized reaction data it often presents, besides the sparsity and skewed distribution. While previous ML studies offered effective predictions on known reactions, efforts in using deep generative models for guiding new reactions and their prospective validation are rare. We harness both predictive and explorative abilities of deep learning on an important catalytic asymmetric β-C(sp3)–H activation reaction, consisting of 220 experimentally reported examples that differs primarily in terms of the substrate, catalyst, and coupling partner. A transfer learning approach using a chemical language model, pretrained on 1 million unlabeled molecules followed by fine-tuning on this reaction data set, is adopted. Our ensemble prediction (EnP) model, where 30 fine-tuned CLMs concurrently predict the %ee of test set reactions, is highly reliable. Another language model, fine-tuned on the 77 known chiral ligands as used in the above reactions, is employed for generating novel ligands of high validity and novelty. A proof of concept wet-lab experimental validation reveals that most of the ML-generated reactions are in excellent agreement with the EnP predictions. Results also caution the prospects of ML-driven reaction development for ligand design and emphasize the importance of domain experts in key decisions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chemical Science
Chemical Science CHEMISTRY, MULTIDISCIPLINARY-
CiteScore
14.40
自引率
4.80%
发文量
1352
审稿时长
2.1 months
期刊介绍: Chemical Science is a journal that encompasses various disciplines within the chemical sciences. Its scope includes publishing ground-breaking research with significant implications for its respective field, as well as appealing to a wider audience in related areas. To be considered for publication, articles must showcase innovative and original advances in their field of study and be presented in a manner that is understandable to scientists from diverse backgrounds. However, the journal generally does not publish highly specialized research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信