基于结构的机器学习与标准化特征选择筛选具有高氨捕获能力的MOFs

IF 7.8 2区 环境科学与生态学 Q1 ENGINEERING, CHEMICAL
Haokun Ti , Letian Yang , Weihao Yan , Yongdong Chen , Fu Yang , Yujing Yue , Yixue Sun , Xia Li , Yechao Niu , Shi Li
{"title":"基于结构的机器学习与标准化特征选择筛选具有高氨捕获能力的MOFs","authors":"Haokun Ti ,&nbsp;Letian Yang ,&nbsp;Weihao Yan ,&nbsp;Yongdong Chen ,&nbsp;Fu Yang ,&nbsp;Yujing Yue ,&nbsp;Yixue Sun ,&nbsp;Xia Li ,&nbsp;Yechao Niu ,&nbsp;Shi Li","doi":"10.1016/j.psep.2025.107955","DOIUrl":null,"url":null,"abstract":"<div><div>It is essential that ammonia, a corrosive and hazardous gas widely used as an industrial feedstock and emerging hydrogen carrier, be effectively captured to safeguard process safety and reduce air pollution in industrial operations. Metal–organic frameworks (MOFs), with tunable porosity and surface chemistries, are promising candidates for NH<sub>3</sub> capture. To accelerate the identification of high-performance MOFs, we develop a standardized machine learning framework integrating diverse structural descriptors with a multi-step feature selection strategy. A 198-dimensional feature space is constructed by combining conventional geometrical descriptors with multiple categories of RDKit-derived features. Feature selection involves four steps: variance thresholding, LightGBM-based importance ranking, Pearson correlation filtering, and forward feature selection, yielding a compact and informative subset that is used to construct a machine learning model. To enhance model interpretability, a multilevel interpretability analysis is further conducted to quantify the contributions of selected features and reveal their structure-performance relationships. Beyond model construction, an integrated assessment for engineering application is performed, including transferability validation across databases, identification of design windows, volumetric capacity ranking, breakthrough time estimation, and preliminary considerations of stability and regeneration. This study offers an efficient approach to accelerate the identification of high-performance MOFs with reduced experimental burden, and its engineering-oriented assessments consolidate the framework’s applicability to process-level practice.</div></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"203 ","pages":"Article 107955"},"PeriodicalIF":7.8000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structure-based machine learning with standardized feature selection for screening MOFs with high ammonia capture capacity\",\"authors\":\"Haokun Ti ,&nbsp;Letian Yang ,&nbsp;Weihao Yan ,&nbsp;Yongdong Chen ,&nbsp;Fu Yang ,&nbsp;Yujing Yue ,&nbsp;Yixue Sun ,&nbsp;Xia Li ,&nbsp;Yechao Niu ,&nbsp;Shi Li\",\"doi\":\"10.1016/j.psep.2025.107955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>It is essential that ammonia, a corrosive and hazardous gas widely used as an industrial feedstock and emerging hydrogen carrier, be effectively captured to safeguard process safety and reduce air pollution in industrial operations. Metal–organic frameworks (MOFs), with tunable porosity and surface chemistries, are promising candidates for NH<sub>3</sub> capture. To accelerate the identification of high-performance MOFs, we develop a standardized machine learning framework integrating diverse structural descriptors with a multi-step feature selection strategy. A 198-dimensional feature space is constructed by combining conventional geometrical descriptors with multiple categories of RDKit-derived features. Feature selection involves four steps: variance thresholding, LightGBM-based importance ranking, Pearson correlation filtering, and forward feature selection, yielding a compact and informative subset that is used to construct a machine learning model. To enhance model interpretability, a multilevel interpretability analysis is further conducted to quantify the contributions of selected features and reveal their structure-performance relationships. Beyond model construction, an integrated assessment for engineering application is performed, including transferability validation across databases, identification of design windows, volumetric capacity ranking, breakthrough time estimation, and preliminary considerations of stability and regeneration. This study offers an efficient approach to accelerate the identification of high-performance MOFs with reduced experimental burden, and its engineering-oriented assessments consolidate the framework’s applicability to process-level practice.</div></div>\",\"PeriodicalId\":20743,\"journal\":{\"name\":\"Process Safety and Environmental Protection\",\"volume\":\"203 \",\"pages\":\"Article 107955\"},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Process Safety and Environmental Protection\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957582025012224\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582025012224","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

摘要

氨气是一种腐蚀性有害气体,被广泛用作工业原料和新兴的氢气载体,为了保障工艺安全,减少工业运行中的空气污染,必须有效捕获氨气。具有可调孔隙度和表面化学性质的金属有机框架(mof)是NH3捕获的有希望的候选者。为了加速高性能mof的识别,我们开发了一个标准化的机器学习框架,该框架集成了多种结构描述符和多步特征选择策略。将传统的几何描述符与rdkit衍生的多类特征相结合,构建了一个198维的特征空间。特征选择包括四个步骤:方差阈值化、基于lightgbm的重要性排序、Pearson相关滤波和前向特征选择,从而产生一个紧凑且信息丰富的子集,用于构建机器学习模型。为了提高模型的可解释性,进一步进行了多层次的可解释性分析,以量化所选特征的贡献,并揭示它们的结构-性能关系。除了模型构建之外,还进行了工程应用的综合评估,包括跨数据库的可转移性验证,设计窗口的识别,容量排序,突破时间估计以及稳定性和再生的初步考虑。该研究提供了一种有效的方法,可以在减少实验负担的情况下加速高性能mof的识别,其面向工程的评估巩固了框架对过程级实践的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Structure-based machine learning with standardized feature selection for screening MOFs with high ammonia capture capacity
It is essential that ammonia, a corrosive and hazardous gas widely used as an industrial feedstock and emerging hydrogen carrier, be effectively captured to safeguard process safety and reduce air pollution in industrial operations. Metal–organic frameworks (MOFs), with tunable porosity and surface chemistries, are promising candidates for NH3 capture. To accelerate the identification of high-performance MOFs, we develop a standardized machine learning framework integrating diverse structural descriptors with a multi-step feature selection strategy. A 198-dimensional feature space is constructed by combining conventional geometrical descriptors with multiple categories of RDKit-derived features. Feature selection involves four steps: variance thresholding, LightGBM-based importance ranking, Pearson correlation filtering, and forward feature selection, yielding a compact and informative subset that is used to construct a machine learning model. To enhance model interpretability, a multilevel interpretability analysis is further conducted to quantify the contributions of selected features and reveal their structure-performance relationships. Beyond model construction, an integrated assessment for engineering application is performed, including transferability validation across databases, identification of design windows, volumetric capacity ranking, breakthrough time estimation, and preliminary considerations of stability and regeneration. This study offers an efficient approach to accelerate the identification of high-performance MOFs with reduced experimental burden, and its engineering-oriented assessments consolidate the framework’s applicability to process-level practice.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Process Safety and Environmental Protection
Process Safety and Environmental Protection 环境科学-工程:化工
CiteScore
11.40
自引率
15.40%
发文量
929
审稿时长
8.0 months
期刊介绍: The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice. PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers. PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信