Jacob W Toney , Aaron G Garrison , Weiliang Luo , Roland G St. Michel , Sukrit Mukhopadhyay , Heather J Kulik
{"title":"Exploring beyond experiment: generating high-quality datasets of transition metal complexes with quantum chemistry and machine learning","authors":"Jacob W Toney , Aaron G Garrison , Weiliang Luo , Roland G St. Michel , Sukrit Mukhopadhyay , Heather J Kulik","doi":"10.1016/j.coche.2025.101189","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning (ML) approaches enable screening of the vast chemical space of transition metal complexes (TMCs) at faster speeds than either experimental approaches or <em>ab initio</em> calculations, but their quality is highly dependent on the reference data used. Existing TMC datasets often leverage experimental structures, which biases methods trained on this data away from reactive configurations. Calculating properties of these TMCs also introduces challenges of spin and oxidation state assignment. Recent work on generating hypothetical TMCs with realistic connectivity and geometry has demonstrated promise to extend datasets beyond experimental structures, especially when combined with ML approaches to identify complexes with desirable properties. Experimental measurements would be ideal to train and/or test these models but are often scarce for TMCs, especially for those that are catalytically active. Thus, properties calculated with electronic structure theory are a popular alternative choice for training ML models. However, TMCs are challenging for many conventional electronic structure methods, and few benchmark datasets exist to assess which methods are most reliable and cost-effective. Many of the recommended methods are computationally demanding, leading to the use of neural network potentials as surrogate models for large-scale screening. By utilizing emerging tools for TMC structure generation and suitable electronic structure methods, increasingly high-quality datasets will be curated to enhance the predictive power of ML approaches to discover novel TMCs, including in the development of neural network potentials. By more accurately predicting TMC properties, promising and practical candidates for catalysis, photosensitizers, molecular devices, and medicine will be identified.</div></div>","PeriodicalId":292,"journal":{"name":"Current Opinion in Chemical Engineering","volume":"50 ","pages":"Article 101189"},"PeriodicalIF":6.8000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Opinion in Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211339825001017","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) approaches enable screening of the vast chemical space of transition metal complexes (TMCs) at faster speeds than either experimental approaches or ab initio calculations, but their quality is highly dependent on the reference data used. Existing TMC datasets often leverage experimental structures, which biases methods trained on this data away from reactive configurations. Calculating properties of these TMCs also introduces challenges of spin and oxidation state assignment. Recent work on generating hypothetical TMCs with realistic connectivity and geometry has demonstrated promise to extend datasets beyond experimental structures, especially when combined with ML approaches to identify complexes with desirable properties. Experimental measurements would be ideal to train and/or test these models but are often scarce for TMCs, especially for those that are catalytically active. Thus, properties calculated with electronic structure theory are a popular alternative choice for training ML models. However, TMCs are challenging for many conventional electronic structure methods, and few benchmark datasets exist to assess which methods are most reliable and cost-effective. Many of the recommended methods are computationally demanding, leading to the use of neural network potentials as surrogate models for large-scale screening. By utilizing emerging tools for TMC structure generation and suitable electronic structure methods, increasingly high-quality datasets will be curated to enhance the predictive power of ML approaches to discover novel TMCs, including in the development of neural network potentials. By more accurately predicting TMC properties, promising and practical candidates for catalysis, photosensitizers, molecular devices, and medicine will be identified.
期刊介绍:
Current Opinion in Chemical Engineering is devoted to bringing forth short and focused review articles written by experts on current advances in different areas of chemical engineering. Only invited review articles will be published.
The goals of each review article in Current Opinion in Chemical Engineering are:
1. To acquaint the reader/researcher with the most important recent papers in the given topic.
2. To provide the reader with the views/opinions of the expert in each topic.
The reviews are short (about 2500 words or 5-10 printed pages with figures) and serve as an invaluable source of information for researchers, teachers, professionals and students. The reviews also aim to stimulate exchange of ideas among experts.
Themed sections:
Each review will focus on particular aspects of one of the following themed sections of chemical engineering:
1. Nanotechnology
2. Energy and environmental engineering
3. Biotechnology and bioprocess engineering
4. Biological engineering (covering tissue engineering, regenerative medicine, drug delivery)
5. Separation engineering (covering membrane technologies, adsorbents, desalination, distillation etc.)
6. Materials engineering (covering biomaterials, inorganic especially ceramic materials, nanostructured materials).
7. Process systems engineering
8. Reaction engineering and catalysis.