离子液体的热容：基于可解释化学结构的机器学习方法

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-04-10 DOI:10.1021/acs.jcim.5c0023810.1021/acs.jcim.5c00238

Ali Esmaeili, Hesamedin Hekmatmehr, Mohammad Moheisen, Saeid Atashrouz*, Ali Abedi and Ahmad Mohaddespour*,

{"title":"离子液体的热容：基于可解释化学结构的机器学习方法","authors":"Ali Esmaeili, Hesamedin Hekmatmehr, Mohammad Moheisen, Saeid Atashrouz*, Ali Abedi and Ahmad Mohaddespour*, ","doi":"10.1021/acs.jcim.5c0023810.1021/acs.jcim.5c00238","DOIUrl":null,"url":null,"abstract":"This study focuses on predicting the heat capacity of pure liquid-phase ionic liquids (ILs) using machine learning models from various categories, including support vector machines, instance-based learning, ensemble learning, and neural networks, with linear regression serving as a baseline. A key aim of this work is not only to achieve accurate predictions but also to ensure the interpretability of the results, addressing a gap often overlooked in predictive modeling studies. To accomplish this, we curated and cleaned a comprehensive data set of 13,893 data points covering 322 ILs, using temperature and chemical structure-based features as inputs. We evaluated model performance and conducted a thorough interpretability analysis to reveal the patterns of the top-performing model’s predictions, ensuring that they are understandable. All models outperformed the baseline, with XGBoost (eXtreme Gradient Boosting) from the ensemble learning category achieving the best results, with total RMSE, R2, and AARD (%) values of 11.389, 0.997, and 1.212%, respectively. Shallow neural networks also performed competitively, suggesting that complex deep learning architectures may not be necessary. Both 10-fold and leave-one-IL-out (LOILO) cross-validation further validated the robustness of these results. Importantly, the interpretability analysis identified key factors influencing heat capacity predictions, such as anion size (e.g., NTf2 and FAP) and alkyl chain length. These factors were validated by testing the model on previously unseen IL examples. Additionally, a user-friendly web application was developed to make predictions, allowing users to input chemical groups or select compounds from a predefined list of 1633 ILs. This study underscores the importance of combining diverse modeling approaches with robust interpretability techniques to achieve reliable and explainable predictions for IL heat capacity.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4010–4026 4010–4026"},"PeriodicalIF":5.3000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heat Capacity of Ionic Liquids: Toward Interpretable Chemical Structure-Based Machine Learning Approaches\",\"authors\":\"Ali Esmaeili, Hesamedin Hekmatmehr, Mohammad Moheisen, Saeid Atashrouz*, Ali Abedi and Ahmad Mohaddespour*, \",\"doi\":\"10.1021/acs.jcim.5c0023810.1021/acs.jcim.5c00238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study focuses on predicting the heat capacity of pure liquid-phase ionic liquids (ILs) using machine learning models from various categories, including support vector machines, instance-based learning, ensemble learning, and neural networks, with linear regression serving as a baseline. A key aim of this work is not only to achieve accurate predictions but also to ensure the interpretability of the results, addressing a gap often overlooked in predictive modeling studies. To accomplish this, we curated and cleaned a comprehensive data set of 13,893 data points covering 322 ILs, using temperature and chemical structure-based features as inputs. We evaluated model performance and conducted a thorough interpretability analysis to reveal the patterns of the top-performing model’s predictions, ensuring that they are understandable. All models outperformed the baseline, with XGBoost (eXtreme Gradient Boosting) from the ensemble learning category achieving the best results, with total RMSE, R2, and AARD (%) values of 11.389, 0.997, and 1.212%, respectively. Shallow neural networks also performed competitively, suggesting that complex deep learning architectures may not be necessary. Both 10-fold and leave-one-IL-out (LOILO) cross-validation further validated the robustness of these results. Importantly, the interpretability analysis identified key factors influencing heat capacity predictions, such as anion size (e.g., NTf2 and FAP) and alkyl chain length. These factors were validated by testing the model on previously unseen IL examples. Additionally, a user-friendly web application was developed to make predictions, allowing users to input chemical groups or select compounds from a predefined list of 1633 ILs. This study underscores the importance of combining diverse modeling approaches with robust interpretability techniques to achieve reliable and explainable predictions for IL heat capacity.\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 8\",\"pages\":\"4010–4026 4010–4026\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00238\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00238","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

本研究的重点是使用不同类别的机器学习模型来预测纯液相离子液体（ILs）的热容量，包括支持向量机、基于实例的学习、集成学习和神经网络，并以线性回归作为基线。这项工作的一个关键目标不仅是实现准确的预测，而且要确保结果的可解释性，解决预测建模研究中经常被忽视的差距。为了实现这一目标，我们使用基于温度和化学结构的特征作为输入，整理和清理了涵盖322个ILs的13,893个数据点的综合数据集。我们评估了模型的性能，并进行了彻底的可解释性分析，以揭示表现最好的模型预测的模式，确保它们是可理解的。所有模型都优于基线，其中来自集成学习类别的XGBoost （eXtreme Gradient Boosting）获得了最好的结果，RMSE、R2和AARD（%）的总值分别为11.389、0.997和1.212%。浅神经网络的表现也很有竞争力，这表明复杂的深度学习架构可能不是必要的。10倍交叉验证和LOILO交叉验证进一步验证了这些结果的稳健性。重要的是，可解释性分析确定了影响热容预测的关键因素，如阴离子大小（如NTf2和FAP）和烷基链长度。通过在以前未见过的IL示例上测试模型来验证这些因素。此外，开发了一个用户友好的web应用程序来进行预测，允许用户输入化学基团或从预定义的1633个il列表中选择化合物。这项研究强调了将多种建模方法与稳健的可解释性技术相结合的重要性，以实现可靠和可解释的IL热容预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Heat Capacity of Ionic Liquids: Toward Interpretable Chemical Structure-Based Machine Learning Approaches

查看原文本刊更多论文

Heat Capacity of Ionic Liquids: Toward Interpretable Chemical Structure-Based Machine Learning Approaches

This study focuses on predicting the heat capacity of pure liquid-phase ionic liquids (ILs) using machine learning models from various categories, including support vector machines, instance-based learning, ensemble learning, and neural networks, with linear regression serving as a baseline. A key aim of this work is not only to achieve accurate predictions but also to ensure the interpretability of the results, addressing a gap often overlooked in predictive modeling studies. To accomplish this, we curated and cleaned a comprehensive data set of 13,893 data points covering 322 ILs, using temperature and chemical structure-based features as inputs. We evaluated model performance and conducted a thorough interpretability analysis to reveal the patterns of the top-performing model’s predictions, ensuring that they are understandable. All models outperformed the baseline, with XGBoost (eXtreme Gradient Boosting) from the ensemble learning category achieving the best results, with total RMSE, R², and AARD (%) values of 11.389, 0.997, and 1.212%, respectively. Shallow neural networks also performed competitively, suggesting that complex deep learning architectures may not be necessary. Both 10-fold and leave-one-IL-out (LOILO) cross-validation further validated the robustness of these results. Importantly, the interpretability analysis identified key factors influencing heat capacity predictions, such as anion size (e.g., NTf₂ and FAP) and alkyl chain length. These factors were validated by testing the model on previously unseen IL examples. Additionally, a user-friendly web application was developed to make predictions, allowing users to input chemical groups or select compounds from a predefined list of 1633 ILs. This study underscores the importance of combining diverse modeling approaches with robust interpretability techniques to achieve reliable and explainable predictions for IL heat capacity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.