Toxicity prediction of insecticides and pesticides via machine learning approach

IF 4 1区 农林科学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Priyansh Singh , Chandra Prakash Gupta , Sarvesh Namdeo , Vimal Chandra Srivastava
{"title":"Toxicity prediction of insecticides and pesticides via machine learning approach","authors":"Priyansh Singh ,&nbsp;Chandra Prakash Gupta ,&nbsp;Sarvesh Namdeo ,&nbsp;Vimal Chandra Srivastava","doi":"10.1016/j.pestbp.2025.106652","DOIUrl":null,"url":null,"abstract":"<div><div>Pesticides are commonly used to protect crops, but their potential toxicity poses significant environmental and health risks. This study explores the effectiveness of seven machine learning (ML) models—Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosted Decision Tree (GBDT), Categorical Boosting (Catboost), Light Gradient-Boosting Machine (LGBM), stacked models (RF + XGB and RF + LGBM)—to predict key toxicity factors for pesticides. The models were designed to estimate the Bio-Concentration Factor (BCF), the n-octanol-water Partition Coefficient (Kow), and the Lethal Dose-50 (LD<sub>50</sub>), using a dataset of 244 pesticides with over 160 features such as molecular weight, temperature, solubility, number of rings, and partition coefficient. A splitting of the dataset into 90 % training and 10 % testing sets. The RF + LGBM stacked model achieved the best performance for BCF prediction, with a coefficient of determination (R<sup>2</sup>) of 0.89 and a Mean Absolute Percentage Error (MAPE) of 12.72 %. Catboost excelled in predicting Kow with an R<sup>2</sup> of 0.88, a Mean square error (MSE) of 0.364, and an MAPE of 22.38 %. For LD50, the RF + XGB stacked model was the most accurate, with an R<sup>2</sup> of 0.75 and a MAPE of 8.5 %. Shapley Additive explanations (SHAP) analysis revealed that log P, water solubility, and SLogP were the most influential features across all models. This study demonstrates the power of machine learning for toxicity prediction while also setting the stage for future research in predictive toxicology, environmental monitoring, and sustainable pesticide regulation, ultimately contributing to more responsible and data-driven agricultural practices.</div></div>","PeriodicalId":19828,"journal":{"name":"Pesticide Biochemistry and Physiology","volume":"215 ","pages":"Article 106652"},"PeriodicalIF":4.0000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pesticide Biochemistry and Physiology","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0048357525003657","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Pesticides are commonly used to protect crops, but their potential toxicity poses significant environmental and health risks. This study explores the effectiveness of seven machine learning (ML) models—Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosted Decision Tree (GBDT), Categorical Boosting (Catboost), Light Gradient-Boosting Machine (LGBM), stacked models (RF + XGB and RF + LGBM)—to predict key toxicity factors for pesticides. The models were designed to estimate the Bio-Concentration Factor (BCF), the n-octanol-water Partition Coefficient (Kow), and the Lethal Dose-50 (LD50), using a dataset of 244 pesticides with over 160 features such as molecular weight, temperature, solubility, number of rings, and partition coefficient. A splitting of the dataset into 90 % training and 10 % testing sets. The RF + LGBM stacked model achieved the best performance for BCF prediction, with a coefficient of determination (R2) of 0.89 and a Mean Absolute Percentage Error (MAPE) of 12.72 %. Catboost excelled in predicting Kow with an R2 of 0.88, a Mean square error (MSE) of 0.364, and an MAPE of 22.38 %. For LD50, the RF + XGB stacked model was the most accurate, with an R2 of 0.75 and a MAPE of 8.5 %. Shapley Additive explanations (SHAP) analysis revealed that log P, water solubility, and SLogP were the most influential features across all models. This study demonstrates the power of machine learning for toxicity prediction while also setting the stage for future research in predictive toxicology, environmental monitoring, and sustainable pesticide regulation, ultimately contributing to more responsible and data-driven agricultural practices.

Abstract Image

基于机器学习方法的杀虫剂和农药毒性预测
农药通常用于保护作物,但其潜在毒性对环境和健康构成重大风险。本研究探讨了随机森林(RF)、极端梯度增强(XGB)、梯度增强决策树(GBDT)、分类增强(Catboost)、光梯度增强机(LGBM)、堆叠模型(RF + XGB和RF + LGBM)七种机器学习(ML)模型预测农药关键毒性因素的有效性。利用244种农药的数据集,包括分子量、温度、溶解度、环数和分配系数等160多个特征,设计了生物浓度因子(BCF)、正辛醇-水分配系数(Kow)和致死剂量50 (LD50)模型。将数据集分成90%的训练集和10%的测试集。RF + LGBM叠加模型对BCF的预测效果最好,决定系数(R2)为0.89,平均绝对百分比误差(MAPE)为12.72%。Catboost预测Kow的R2为0.88,均方误差(MSE)为0.364,MAPE为22.38%。对于LD50, RF + XGB堆叠模型最准确,R2为0.75,MAPE为8.5%。Shapley加性解释(SHAP)分析显示,logp、水溶性和logp是所有模型中影响最大的特征。这项研究展示了机器学习在毒性预测方面的强大功能,同时也为未来预测毒理学、环境监测和可持续农药监管方面的研究奠定了基础,最终为更负责任和数据驱动的农业实践做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.00
自引率
8.50%
发文量
238
审稿时长
4.2 months
期刊介绍: Pesticide Biochemistry and Physiology publishes original scientific articles pertaining to the mode of action of plant protection agents such as insecticides, fungicides, herbicides, and similar compounds, including nonlethal pest control agents, biosynthesis of pheromones, hormones, and plant resistance agents. Manuscripts may include a biochemical, physiological, or molecular study for an understanding of comparative toxicology or selective toxicity of both target and nontarget organisms. Particular interest will be given to studies on the molecular biology of pest control, toxicology, and pesticide resistance. Research Areas Emphasized Include the Biochemistry and Physiology of: • Comparative toxicity • Mode of action • Pathophysiology • Plant growth regulators • Resistance • Other effects of pesticides on both parasites and hosts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信