可解释机器学习预测离子液体毒性

IF 3.7 3区 工程技术 Q2 ENGINEERING, CHEMICAL
Haijun Feng , Li Jiajia , Zhou Jian
{"title":"可解释机器学习预测离子液体毒性","authors":"Haijun Feng ,&nbsp;Li Jiajia ,&nbsp;Zhou Jian","doi":"10.1016/j.cjche.2025.04.018","DOIUrl":null,"url":null,"abstract":"<div><div>The potential toxicity of ionic liquids (ILs) affects their applications; how to control the toxicity is one of the key issues in their applications. To understand its toxicity structure relationship and promote its greener application, six different machine learning algorithms, including Bagging, Adaptive Boosting (AdaBoost), Gradient Boosting (GBoost), Stacking, Voting and Categorical Boosting (CatBoost), are established to model the toxicity of ILs on four distinct datasets including Leukemia rat cell line IPC-81 (IPC-81), Acetylcholinesterase (AChE), <em>Escherichia coli</em> (<em>E.coli</em>) and <em>Vibrio fischeri</em>. Molecular descriptors obtained from the simplified molecular input line entry system (SMILES) are used to characterize ILs. All models are assessed by the mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (<em>R</em><sup>2</sup>). Additionally, an interpretation model based on SHapley Additive exPlanations (SHAP) is built to determine the positive and negative effects of each molecular feature on toxicity. With additional parameters and complexity, the Catboost model outperforms the other models, making it a more reliable model for ILs' toxicity prediction. The results of the model's interpretation indicate that the most significant positive features, SMR_VSA5, PEOE_VSA8, Kappa2, PEOE_VSA6, SMR_VSA5, PEOE_VSA6 and EState_VSA1, can increase the toxicity of ILs as their levels rise, while the most significant negative features, VSA_EState7, EState_VSA8, PEOE_VSA9 and FpDensityMorgan1, can decrease the toxicity as their levels rise. Also, an IL's toxicity will grow as its average molecular weight and number of pyridine rings increase, whereas its toxicity will decrease as its hydrogen bond acceptors increase. This finding offers a theoretical foundation for rapid screening and synthesis of environmentally-benign ILs.</div></div>","PeriodicalId":9966,"journal":{"name":"Chinese Journal of Chemical Engineering","volume":"84 ","pages":"Pages 201-210"},"PeriodicalIF":3.7000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of ionic liquid toxicity by interpretable machine learning\",\"authors\":\"Haijun Feng ,&nbsp;Li Jiajia ,&nbsp;Zhou Jian\",\"doi\":\"10.1016/j.cjche.2025.04.018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The potential toxicity of ionic liquids (ILs) affects their applications; how to control the toxicity is one of the key issues in their applications. To understand its toxicity structure relationship and promote its greener application, six different machine learning algorithms, including Bagging, Adaptive Boosting (AdaBoost), Gradient Boosting (GBoost), Stacking, Voting and Categorical Boosting (CatBoost), are established to model the toxicity of ILs on four distinct datasets including Leukemia rat cell line IPC-81 (IPC-81), Acetylcholinesterase (AChE), <em>Escherichia coli</em> (<em>E.coli</em>) and <em>Vibrio fischeri</em>. Molecular descriptors obtained from the simplified molecular input line entry system (SMILES) are used to characterize ILs. All models are assessed by the mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (<em>R</em><sup>2</sup>). Additionally, an interpretation model based on SHapley Additive exPlanations (SHAP) is built to determine the positive and negative effects of each molecular feature on toxicity. With additional parameters and complexity, the Catboost model outperforms the other models, making it a more reliable model for ILs' toxicity prediction. The results of the model's interpretation indicate that the most significant positive features, SMR_VSA5, PEOE_VSA8, Kappa2, PEOE_VSA6, SMR_VSA5, PEOE_VSA6 and EState_VSA1, can increase the toxicity of ILs as their levels rise, while the most significant negative features, VSA_EState7, EState_VSA8, PEOE_VSA9 and FpDensityMorgan1, can decrease the toxicity as their levels rise. Also, an IL's toxicity will grow as its average molecular weight and number of pyridine rings increase, whereas its toxicity will decrease as its hydrogen bond acceptors increase. This finding offers a theoretical foundation for rapid screening and synthesis of environmentally-benign ILs.</div></div>\",\"PeriodicalId\":9966,\"journal\":{\"name\":\"Chinese Journal of Chemical Engineering\",\"volume\":\"84 \",\"pages\":\"Pages 201-210\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1004954125002125\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1004954125002125","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

摘要

离子液体的潜在毒性影响了其应用;如何控制其毒性是其应用的关键问题之一。为了了解其毒性结构关系并促进其绿色应用,建立了六种不同的机器学习算法,包括Bagging, Adaptive Boosting (AdaBoost), Gradient Boosting (GBoost), Stacking, Voting和Categorical Boosting (CatBoost),在白血病大鼠细胞系IPC-81 (IPC-81),乙酰胆碱酯酶(AChE),大肠杆菌(E.coli)和fischeri弧菌等四种不同的数据集上模拟il的毒性。从简化的分子输入行输入系统(SMILES)中获得的分子描述符用于表征il。采用均方误差(MSE)、均方根误差(RMSE)、平均绝对误差(MAE)和相关系数(R2)对所有模型进行评估。此外,建立了基于SHapley加性解释(SHAP)的解释模型,以确定每种分子特征对毒性的正面和负面影响。Catboost模型具有额外的参数和复杂性,优于其他模型,使其成为更可靠的il毒性预测模型。模型解释结果表明,最显著的阳性特征SMR_VSA5、PEOE_VSA8、Kappa2、PEOE_VSA6、SMR_VSA5、PEOE_VSA6和EState_VSA1会随着il水平的升高而增加其毒性,而最显著的阴性特征VSA_EState7、EState_VSA8、PEOE_VSA9和FpDensityMorgan1会随着il水平的升高而降低其毒性。此外,IL的毒性会随着其平均分子量和吡啶环数量的增加而增加,而其毒性会随着其氢键受体的增加而降低。这一发现为快速筛选和合成对环境无害的il提供了理论基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Prediction of ionic liquid toxicity by interpretable machine learning

Prediction of ionic liquid toxicity by interpretable machine learning
The potential toxicity of ionic liquids (ILs) affects their applications; how to control the toxicity is one of the key issues in their applications. To understand its toxicity structure relationship and promote its greener application, six different machine learning algorithms, including Bagging, Adaptive Boosting (AdaBoost), Gradient Boosting (GBoost), Stacking, Voting and Categorical Boosting (CatBoost), are established to model the toxicity of ILs on four distinct datasets including Leukemia rat cell line IPC-81 (IPC-81), Acetylcholinesterase (AChE), Escherichia coli (E.coli) and Vibrio fischeri. Molecular descriptors obtained from the simplified molecular input line entry system (SMILES) are used to characterize ILs. All models are assessed by the mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (R2). Additionally, an interpretation model based on SHapley Additive exPlanations (SHAP) is built to determine the positive and negative effects of each molecular feature on toxicity. With additional parameters and complexity, the Catboost model outperforms the other models, making it a more reliable model for ILs' toxicity prediction. The results of the model's interpretation indicate that the most significant positive features, SMR_VSA5, PEOE_VSA8, Kappa2, PEOE_VSA6, SMR_VSA5, PEOE_VSA6 and EState_VSA1, can increase the toxicity of ILs as their levels rise, while the most significant negative features, VSA_EState7, EState_VSA8, PEOE_VSA9 and FpDensityMorgan1, can decrease the toxicity as their levels rise. Also, an IL's toxicity will grow as its average molecular weight and number of pyridine rings increase, whereas its toxicity will decrease as its hydrogen bond acceptors increase. This finding offers a theoretical foundation for rapid screening and synthesis of environmentally-benign ILs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chinese Journal of Chemical Engineering
Chinese Journal of Chemical Engineering 工程技术-工程:化工
CiteScore
6.60
自引率
5.30%
发文量
4309
审稿时长
31 days
期刊介绍: The Chinese Journal of Chemical Engineering (Monthly, started in 1982) is the official journal of the Chemical Industry and Engineering Society of China and published by the Chemical Industry Press Co. Ltd. The aim of the journal is to develop the international exchange of scientific and technical information in the field of chemical engineering. It publishes original research papers that cover the major advancements and achievements in chemical engineering in China as well as some articles from overseas contributors. The topics of journal include chemical engineering, chemical technology, biochemical engineering, energy and environmental engineering and other relevant fields. Papers are published on the basis of their relevance to theoretical research, practical application or potential uses in the industry as Research Papers, Communications, Reviews and Perspectives. Prominent domestic and overseas chemical experts and scholars have been invited to form an International Advisory Board and the Editorial Committee. It enjoys recognition among Chinese academia and industry as a reliable source of information of what is going on in chemical engineering research, both domestic and abroad.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信