Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish

IF 12.2 1区 环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL
Minghua Zhu , Zijun Xiao , Tao Zhang , Guanghua Lu
{"title":"Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish","authors":"Minghua Zhu ,&nbsp;Zijun Xiao ,&nbsp;Tao Zhang ,&nbsp;Guanghua Lu","doi":"10.1016/j.jhazmat.2024.136606","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as AD<sub>SAL</sub>) methodology. The optimal EL models, together with the AD<sub>SAL</sub>, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.</div></div>","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":"482 ","pages":"Article 136606"},"PeriodicalIF":12.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030438942403187X","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as ADSAL) methodology. The optimal EL models, together with the ADSAL, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.

Abstract Image

构建可解释的集合学习模型以预测鱼类体内有机化学品的生物累积参数
准确预测生物累积参数对于评估化学品的暴露、危害和风险至关重要。然而,大多数生物累积参数预测模型都是基于单一算法的单个模型,缺乏模型解释,由于算法的固有限制和可解释性较弱,导致预测精度不尽人意。结合多种算法的集合学习(EL),再加上SHAPLE Additive exPlanation(SHAP)方法,可以克服上述局限性。在此,我们利用涵盖 2496 种化学品的数据集为三种生物累积参数构建了 EL 模型。与本研究开发的单个模型和以前研究的模型相比,EL 模型显示出更高的预测准确性,在验证集上的决定系数高达 0.861。采用基于结构-活性景观(简称 ADSAL)的方法对适用域进行了表征。最佳 EL 模型和 ADSAL 成功用于预测《中国现有化学物质名录》中 4,374 种化学物质的生物累积参数。利用 SHAP 方法对模型进行解释,可深入了解影响生物累积潜力的关键特征,包括疏水性、水溶性、极化性、电离电位、重量和分子体积。总之,这项研究为化学品的无害管理和风险评估提供了数据和模型支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Hazardous Materials
Journal of Hazardous Materials 工程技术-工程:环境
CiteScore
25.40
自引率
5.90%
发文量
3059
审稿时长
58 days
期刊介绍: The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信