{"title":"机器学习与新的功能结构描述符设计和筛选离子液体在二氧化碳有效捕获","authors":"Ranran Geng, Wenjuan Deng, Zhiqiang Hu, Jianlei Wang, Yuanyuan Zhao, Baichuan Zhou and Guocai Tian","doi":"10.1039/D5CP01972A","DOIUrl":null,"url":null,"abstract":"<p >Carbon dioxide emission reduction, conversion and utilization are hot topics and difficult issues in the world. As a new class of green solvents, ionic liquids (ILs) are widely used in CO<small><sub>2</sub></small> capture and conversion, but there are various kinds of ILs (more than 10<small><sup>18</sup></small>). How to select and screen appropriate ILs for CO<small><sub>2</sub></small> capture is an urgent problem to be solved. Therefore, it is of great significance to establish the quantitative structure–property relationship (QSPR) of ILs for CO<small><sub>2</sub></small> capture. From the practical point of view of IL design and synthesis, a new functional structure descriptor (FSD) based on the group contribution method (GC) was constructed. At the same time, the idea of increasing dimensions to increase accuracy in traditional machine learning is changed, and the feasibility of reducing the dimension under the condition of ensuring accuracy is examined. A dimensionless molecular descriptor CORE is constructed. Based on these two new molecular descriptors, we discussed the performance of six common ensemble learning models (CatBoost, LightGBM, XGBoost, GBDT, RF and AdaBoost) for CO<small><sub>2</sub></small> solubility in ILs. It is shown that all ensemble learning models can achieve good performance, but the CatBoost model is the most outstanding. An <em>R</em><small><sup>2</sup></small> of 0.9945 and MAE of 0.0108 for the CatBoost-FSD model is achieved, while the <em>R</em><small><sup>2</sup></small> and MAE values are 0.9925 and 0.0120 for the CatBoost-CORE model, respectively. The interpretability of the CatBoost-FSD model is analyzed, and the key features are determined. Based on the CORE descriptor, the best experimental conditions are obtained, and nine kinds of ILs with superior performance are recommended.</p>","PeriodicalId":99,"journal":{"name":"Physical Chemistry Chemical Physics","volume":" 27","pages":" 14482-14491"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning with new functional structure descriptors for design and screening of ionic liquids in CO2 efficient capture†\",\"authors\":\"Ranran Geng, Wenjuan Deng, Zhiqiang Hu, Jianlei Wang, Yuanyuan Zhao, Baichuan Zhou and Guocai Tian\",\"doi\":\"10.1039/D5CP01972A\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Carbon dioxide emission reduction, conversion and utilization are hot topics and difficult issues in the world. As a new class of green solvents, ionic liquids (ILs) are widely used in CO<small><sub>2</sub></small> capture and conversion, but there are various kinds of ILs (more than 10<small><sup>18</sup></small>). How to select and screen appropriate ILs for CO<small><sub>2</sub></small> capture is an urgent problem to be solved. Therefore, it is of great significance to establish the quantitative structure–property relationship (QSPR) of ILs for CO<small><sub>2</sub></small> capture. From the practical point of view of IL design and synthesis, a new functional structure descriptor (FSD) based on the group contribution method (GC) was constructed. At the same time, the idea of increasing dimensions to increase accuracy in traditional machine learning is changed, and the feasibility of reducing the dimension under the condition of ensuring accuracy is examined. A dimensionless molecular descriptor CORE is constructed. Based on these two new molecular descriptors, we discussed the performance of six common ensemble learning models (CatBoost, LightGBM, XGBoost, GBDT, RF and AdaBoost) for CO<small><sub>2</sub></small> solubility in ILs. It is shown that all ensemble learning models can achieve good performance, but the CatBoost model is the most outstanding. An <em>R</em><small><sup>2</sup></small> of 0.9945 and MAE of 0.0108 for the CatBoost-FSD model is achieved, while the <em>R</em><small><sup>2</sup></small> and MAE values are 0.9925 and 0.0120 for the CatBoost-CORE model, respectively. The interpretability of the CatBoost-FSD model is analyzed, and the key features are determined. Based on the CORE descriptor, the best experimental conditions are obtained, and nine kinds of ILs with superior performance are recommended.</p>\",\"PeriodicalId\":99,\"journal\":{\"name\":\"Physical Chemistry Chemical Physics\",\"volume\":\" 27\",\"pages\":\" 14482-14491\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical Chemistry Chemical Physics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/cp/d5cp01972a\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Chemistry Chemical Physics","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/cp/d5cp01972a","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
Machine learning with new functional structure descriptors for design and screening of ionic liquids in CO2 efficient capture†
Carbon dioxide emission reduction, conversion and utilization are hot topics and difficult issues in the world. As a new class of green solvents, ionic liquids (ILs) are widely used in CO2 capture and conversion, but there are various kinds of ILs (more than 1018). How to select and screen appropriate ILs for CO2 capture is an urgent problem to be solved. Therefore, it is of great significance to establish the quantitative structure–property relationship (QSPR) of ILs for CO2 capture. From the practical point of view of IL design and synthesis, a new functional structure descriptor (FSD) based on the group contribution method (GC) was constructed. At the same time, the idea of increasing dimensions to increase accuracy in traditional machine learning is changed, and the feasibility of reducing the dimension under the condition of ensuring accuracy is examined. A dimensionless molecular descriptor CORE is constructed. Based on these two new molecular descriptors, we discussed the performance of six common ensemble learning models (CatBoost, LightGBM, XGBoost, GBDT, RF and AdaBoost) for CO2 solubility in ILs. It is shown that all ensemble learning models can achieve good performance, but the CatBoost model is the most outstanding. An R2 of 0.9945 and MAE of 0.0108 for the CatBoost-FSD model is achieved, while the R2 and MAE values are 0.9925 and 0.0120 for the CatBoost-CORE model, respectively. The interpretability of the CatBoost-FSD model is analyzed, and the key features are determined. Based on the CORE descriptor, the best experimental conditions are obtained, and nine kinds of ILs with superior performance are recommended.
期刊介绍:
Physical Chemistry Chemical Physics (PCCP) is an international journal co-owned by 19 physical chemistry and physics societies from around the world. This journal publishes original, cutting-edge research in physical chemistry, chemical physics and biophysical chemistry. To be suitable for publication in PCCP, articles must include significant innovation and/or insight into physical chemistry; this is the most important criterion that reviewers and Editors will judge against when evaluating submissions.
The journal has a broad scope and welcomes contributions spanning experiment, theory, computation and data science. Topical coverage includes spectroscopy, dynamics, kinetics, statistical mechanics, thermodynamics, electrochemistry, catalysis, surface science, quantum mechanics, quantum computing and machine learning. Interdisciplinary research areas such as polymers and soft matter, materials, nanoscience, energy, surfaces/interfaces, and biophysical chemistry are welcomed if they demonstrate significant innovation and/or insight into physical chemistry. Joined experimental/theoretical studies are particularly appreciated when complementary and based on up-to-date approaches.