Small-dataset-orientated data-driven screening for catalytic propane activation

Artificial intelligence chemistry Pub Date : 2024-12-07 DOI:10.1016/j.aichem.2024.100083

Jiaqi Chen , Junqing Li , Ziyi Liu, Shitao Sun, Shijia Zhou, Dongqi Wang

{"title":"Small-dataset-orientated data-driven screening for catalytic propane activation","authors":"Jiaqi Chen , Junqing Li , Ziyi Liu, Shitao Sun, Shijia Zhou, Dongqi Wang","doi":"10.1016/j.aichem.2024.100083","DOIUrl":null,"url":null,"abstract":"<div><div>This work aims at the proper application of machine learning screening of direct propane dehydrogenation (PDH) reaction and oxidative dehydrogenation (ODH) of propane, which are two main protocols to convert propane to propylene and featured by limited available experimental data. Current studies mainly adopt trial-and-error strategy, which is time consuming and raises concerns on environment and health owing to the release of chemical waste. This motivates the introduction of data-driven research paradigm to alleviate the deficiency of the traditional trial-and-error strategy, which however relies on large quantity of high quality data. In this work, a dataset enveloping PDH and ODH data was constructed, and the performance of machine learning algorithms in the study of light alkane activation was evaluated, based on which a strategy appropriate for small dataset was proposed: for small unbalanced datasets, it is sensible to train the model by treating the dataset as a whole rather than to fuse multiple specific models based on divided smaller pieces of data. The results show that the trained models using ensemble algorithms exhibited the best predictability of propylene selectivity, i.e. CatBoost and random forest for PDH and LightGBM for ODH, respectively. Based on the optimal model, the key influencing factors in PDH and ODH were identified. This study demonstrates the proper use of data-driven strategy in the catalytic science, which can be adopted in other scientific problems that suffer from the limited available high quality data and contribute to the gain of novel understanding, e.g. the rational design and optimization of the catalytic systems.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100083"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747724000411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This work aims at the proper application of machine learning screening of direct propane dehydrogenation (PDH) reaction and oxidative dehydrogenation (ODH) of propane, which are two main protocols to convert propane to propylene and featured by limited available experimental data. Current studies mainly adopt trial-and-error strategy, which is time consuming and raises concerns on environment and health owing to the release of chemical waste. This motivates the introduction of data-driven research paradigm to alleviate the deficiency of the traditional trial-and-error strategy, which however relies on large quantity of high quality data. In this work, a dataset enveloping PDH and ODH data was constructed, and the performance of machine learning algorithms in the study of light alkane activation was evaluated, based on which a strategy appropriate for small dataset was proposed: for small unbalanced datasets, it is sensible to train the model by treating the dataset as a whole rather than to fuse multiple specific models based on divided smaller pieces of data. The results show that the trained models using ensemble algorithms exhibited the best predictability of propylene selectivity, i.e. CatBoost and random forest for PDH and LightGBM for ODH, respectively. Based on the optimal model, the key influencing factors in PDH and ODH were identified. This study demonstrates the proper use of data-driven strategy in the catalytic science, which can be adopted in other scientific problems that suffer from the limited available high quality data and contribute to the gain of novel understanding, e.g. the rational design and optimization of the catalytic systems.

查看原文本刊更多论文

面向小数据集的催化丙烷活化数据驱动筛选

丙烷直接脱氢（PDH）反应和氧化脱氢（ODH）反应是丙烷制丙烯的两种主要工艺，实验数据有限，本研究旨在将机器学习技术应用于丙烷直接脱氢（PDH）反应和氧化脱氢（ODH）反应的筛选。目前的研究主要采用试错策略，这种策略耗时，并且由于化学废物的释放而引起对环境和健康的关注。这促使数据驱动研究范式的引入，以缓解传统的试错策略的不足，而传统的试错策略依赖于大量高质量的数据。本文构建了一个包含PDH和ODH数据的数据集，并对机器学习算法在轻烷烃活化研究中的性能进行了评估，在此基础上提出了一种适合小数据集的策略：对于小的不平衡数据集，将数据集作为一个整体来训练模型是明智的，而不是基于分割的小块数据融合多个特定模型。结果表明，使用集成算法训练的模型对丙烯选择性具有最佳的可预测性，即CatBoost和random forest分别对PDH和LightGBM对ODH具有最佳的可预测性。基于优化模型，确定了影响PDH和ODH的关键因素。本研究展示了数据驱动策略在催化科学中的正确使用，该策略可用于解决其他科学问题，这些问题受到可用高质量数据的限制，并有助于获得新的理解，例如催化系统的合理设计和优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial intelligence chemistry Chemistry (General)

自引率

0.00%

发文量

审稿时长

21 days