为不平衡实验催化剂发现量身定制的机器学习和可解释的AI框架

IF 3.2 3区 化学 Q2 CHEMISTRY, PHYSICAL
Parastoo Semnani, Mihail Bogojeski, Florian Bley, Zizheng Zhang, Qiong Wu, Thomas Kneib, Jan Herrmann, Christoph Weisser, Florina Patcas, Klaus-Robert Müller
{"title":"为不平衡实验催化剂发现量身定制的机器学习和可解释的AI框架","authors":"Parastoo Semnani, Mihail Bogojeski, Florian Bley, Zizheng Zhang, Qiong Wu, Thomas Kneib, Jan Herrmann, Christoph Weisser, Florina Patcas, Klaus-Robert Müller","doi":"10.1021/acs.jpcc.4c05332","DOIUrl":null,"url":null,"abstract":"The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.","PeriodicalId":61,"journal":{"name":"The Journal of Physical Chemistry C","volume":"8 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery\",\"authors\":\"Parastoo Semnani, Mihail Bogojeski, Florian Bley, Zizheng Zhang, Qiong Wu, Thomas Kneib, Jan Herrmann, Christoph Weisser, Florina Patcas, Klaus-Robert Müller\",\"doi\":\"10.1021/acs.jpcc.4c05332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.\",\"PeriodicalId\":61,\"journal\":{\"name\":\"The Journal of Physical Chemistry C\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Physical Chemistry C\",\"FirstCategoryId\":\"1\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jpcc.4c05332\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry C","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.jpcc.4c05332","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

由于收集高质量和多样化数据的挑战,机器学习(ML)在催化剂设计中的成功应用变得困难。由于催化剂组分之间复杂的相互作用,新型催化剂的设计长期依赖于试错,这是一个昂贵且劳动密集型的过程,导致稀缺的数据严重偏向于不希望的低收率催化剂。这些数据对训练机器学习模型提出了挑战,这些模型可以很好地推广到新的组合物,这对于机器学习引导的催化剂发现的成功是必要的。尽管机器学习应用在这一领域越来越受欢迎,但到目前为止,大多数努力都没有集中在处理这些实验数据带来的挑战上。在这项工作中,我们引入了一个强大的机器学习和可解释的人工智能(XAI)框架,该框架结合了一系列完善的机器学习方法,旨在提高模型性能,并在稀缺和类别不平衡数据的背景下为催化产率分类提供可靠的评估。我们应用该框架对甲烷氧化偶联反应中不同催化剂组合的产率进行分类,并用它来评估一系列机器学习模型的性能:基于树的模型(如决策树、随机森林和梯度增强树)、逻辑回归、支持向量机和神经网络。我们的实验表明,在我们的框架中使用的方法导致了更稳健的性能估计,并减少了类不平衡对模型训练的影响,导致除了一个评估模型之外的所有模型的预测能力都得到了显着提高。此外,该框架的XAI组件通过识别预测催化剂性能的最重要特征来分析每个ML模型的决策过程。我们的分析发现,提供类感知解释的XAI方法(如分层相关传播)设法确定了对高产催化剂有特殊贡献的关键组件。这些发现与化学直觉和现有文献一致,加强了它们的有效性。我们相信这个框架可以作为机器学习在催化中的应用的蓝图和一套最佳实践,推动未来的研究,同时提供强大的模型和可操作的见解,可以帮助化学家设计和发现具有卓越性能的新型催化剂。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery
The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
The Journal of Physical Chemistry C
The Journal of Physical Chemistry C 化学-材料科学:综合
CiteScore
6.50
自引率
8.10%
发文量
2047
审稿时长
1.8 months
期刊介绍: The Journal of Physical Chemistry A/B/C is devoted to reporting new and original experimental and theoretical basic research of interest to physical chemists, biophysical chemists, and chemical physicists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信