Exploring beyond experiment: generating high-quality datasets of transition metal complexes with quantum chemistry and machine learning

IF 6.8 2区 工程技术 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Jacob W Toney , Aaron G Garrison , Weiliang Luo , Roland G St. Michel , Sukrit Mukhopadhyay , Heather J Kulik
{"title":"Exploring beyond experiment: generating high-quality datasets of transition metal complexes with quantum chemistry and machine learning","authors":"Jacob W Toney ,&nbsp;Aaron G Garrison ,&nbsp;Weiliang Luo ,&nbsp;Roland G St. Michel ,&nbsp;Sukrit Mukhopadhyay ,&nbsp;Heather J Kulik","doi":"10.1016/j.coche.2025.101189","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning (ML) approaches enable screening of the vast chemical space of transition metal complexes (TMCs) at faster speeds than either experimental approaches or <em>ab initio</em> calculations, but their quality is highly dependent on the reference data used. Existing TMC datasets often leverage experimental structures, which biases methods trained on this data away from reactive configurations. Calculating properties of these TMCs also introduces challenges of spin and oxidation state assignment. Recent work on generating hypothetical TMCs with realistic connectivity and geometry has demonstrated promise to extend datasets beyond experimental structures, especially when combined with ML approaches to identify complexes with desirable properties. Experimental measurements would be ideal to train and/or test these models but are often scarce for TMCs, especially for those that are catalytically active. Thus, properties calculated with electronic structure theory are a popular alternative choice for training ML models. However, TMCs are challenging for many conventional electronic structure methods, and few benchmark datasets exist to assess which methods are most reliable and cost-effective. Many of the recommended methods are computationally demanding, leading to the use of neural network potentials as surrogate models for large-scale screening. By utilizing emerging tools for TMC structure generation and suitable electronic structure methods, increasingly high-quality datasets will be curated to enhance the predictive power of ML approaches to discover novel TMCs, including in the development of neural network potentials. By more accurately predicting TMC properties, promising and practical candidates for catalysis, photosensitizers, molecular devices, and medicine will be identified.</div></div>","PeriodicalId":292,"journal":{"name":"Current Opinion in Chemical Engineering","volume":"50 ","pages":"Article 101189"},"PeriodicalIF":6.8000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Opinion in Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211339825001017","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) approaches enable screening of the vast chemical space of transition metal complexes (TMCs) at faster speeds than either experimental approaches or ab initio calculations, but their quality is highly dependent on the reference data used. Existing TMC datasets often leverage experimental structures, which biases methods trained on this data away from reactive configurations. Calculating properties of these TMCs also introduces challenges of spin and oxidation state assignment. Recent work on generating hypothetical TMCs with realistic connectivity and geometry has demonstrated promise to extend datasets beyond experimental structures, especially when combined with ML approaches to identify complexes with desirable properties. Experimental measurements would be ideal to train and/or test these models but are often scarce for TMCs, especially for those that are catalytically active. Thus, properties calculated with electronic structure theory are a popular alternative choice for training ML models. However, TMCs are challenging for many conventional electronic structure methods, and few benchmark datasets exist to assess which methods are most reliable and cost-effective. Many of the recommended methods are computationally demanding, leading to the use of neural network potentials as surrogate models for large-scale screening. By utilizing emerging tools for TMC structure generation and suitable electronic structure methods, increasingly high-quality datasets will be curated to enhance the predictive power of ML approaches to discover novel TMCs, including in the development of neural network potentials. By more accurately predicting TMC properties, promising and practical candidates for catalysis, photosensitizers, molecular devices, and medicine will be identified.
探索超越实验:用量子化学和机器学习生成高质量的过渡金属配合物数据集
机器学习(ML)方法能够以比实验方法或从头计算更快的速度筛选过渡金属配合物(tmc)的巨大化学空间,但它们的质量高度依赖于所使用的参考数据。现有的TMC数据集通常利用实验结构,这会使在这些数据上训练的方法偏离反应性配置。计算这些tmc的性质也带来了自旋和氧化态分配的挑战。最近在生成具有现实连通性和几何形状的假想tmc方面的工作已经证明了将数据集扩展到实验结构之外的前景,特别是当与ML方法相结合以识别具有理想属性的复合物时。实验测量是训练和/或测试这些模型的理想方法,但对于tmc,特别是那些具有催化活性的tmc,往往缺乏实验测量。因此,用电子结构理论计算的属性是训练ML模型的一种流行的替代选择。然而,对于许多传统的电子结构方法来说,tmc是一个挑战,并且很少有基准数据集来评估哪种方法最可靠和最具成本效益。许多推荐的方法对计算量要求很高,导致使用神经网络电位作为大规模筛选的替代模型。通过利用新兴的TMC结构生成工具和合适的电子结构方法,越来越多的高质量数据集将被整理,以增强机器学习方法的预测能力,以发现新的TMC,包括神经网络潜力的发展。通过更准确地预测TMC的性质,将确定催化、光敏剂、分子器件和药物等有前途和实用的候选材料。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Opinion in Chemical Engineering
Current Opinion in Chemical Engineering BIOTECHNOLOGY & APPLIED MICROBIOLOGYENGINE-ENGINEERING, CHEMICAL
CiteScore
12.80
自引率
3.00%
发文量
114
期刊介绍: Current Opinion in Chemical Engineering is devoted to bringing forth short and focused review articles written by experts on current advances in different areas of chemical engineering. Only invited review articles will be published. The goals of each review article in Current Opinion in Chemical Engineering are: 1. To acquaint the reader/researcher with the most important recent papers in the given topic. 2. To provide the reader with the views/opinions of the expert in each topic. The reviews are short (about 2500 words or 5-10 printed pages with figures) and serve as an invaluable source of information for researchers, teachers, professionals and students. The reviews also aim to stimulate exchange of ideas among experts. Themed sections: Each review will focus on particular aspects of one of the following themed sections of chemical engineering: 1. Nanotechnology 2. Energy and environmental engineering 3. Biotechnology and bioprocess engineering 4. Biological engineering (covering tissue engineering, regenerative medicine, drug delivery) 5. Separation engineering (covering membrane technologies, adsorbents, desalination, distillation etc.) 6. Materials engineering (covering biomaterials, inorganic especially ceramic materials, nanostructured materials). 7. Process systems engineering 8. Reaction engineering and catalysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信