{"title":"Toxicity prediction of insecticides and pesticides via machine learning approach","authors":"Priyansh Singh , Chandra Prakash Gupta , Sarvesh Namdeo , Vimal Chandra Srivastava","doi":"10.1016/j.pestbp.2025.106652","DOIUrl":null,"url":null,"abstract":"<div><div>Pesticides are commonly used to protect crops, but their potential toxicity poses significant environmental and health risks. This study explores the effectiveness of seven machine learning (ML) models—Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosted Decision Tree (GBDT), Categorical Boosting (Catboost), Light Gradient-Boosting Machine (LGBM), stacked models (RF + XGB and RF + LGBM)—to predict key toxicity factors for pesticides. The models were designed to estimate the Bio-Concentration Factor (BCF), the n-octanol-water Partition Coefficient (Kow), and the Lethal Dose-50 (LD<sub>50</sub>), using a dataset of 244 pesticides with over 160 features such as molecular weight, temperature, solubility, number of rings, and partition coefficient. A splitting of the dataset into 90 % training and 10 % testing sets. The RF + LGBM stacked model achieved the best performance for BCF prediction, with a coefficient of determination (R<sup>2</sup>) of 0.89 and a Mean Absolute Percentage Error (MAPE) of 12.72 %. Catboost excelled in predicting Kow with an R<sup>2</sup> of 0.88, a Mean square error (MSE) of 0.364, and an MAPE of 22.38 %. For LD50, the RF + XGB stacked model was the most accurate, with an R<sup>2</sup> of 0.75 and a MAPE of 8.5 %. Shapley Additive explanations (SHAP) analysis revealed that log P, water solubility, and SLogP were the most influential features across all models. This study demonstrates the power of machine learning for toxicity prediction while also setting the stage for future research in predictive toxicology, environmental monitoring, and sustainable pesticide regulation, ultimately contributing to more responsible and data-driven agricultural practices.</div></div>","PeriodicalId":19828,"journal":{"name":"Pesticide Biochemistry and Physiology","volume":"215 ","pages":"Article 106652"},"PeriodicalIF":4.0000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pesticide Biochemistry and Physiology","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0048357525003657","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Pesticides are commonly used to protect crops, but their potential toxicity poses significant environmental and health risks. This study explores the effectiveness of seven machine learning (ML) models—Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosted Decision Tree (GBDT), Categorical Boosting (Catboost), Light Gradient-Boosting Machine (LGBM), stacked models (RF + XGB and RF + LGBM)—to predict key toxicity factors for pesticides. The models were designed to estimate the Bio-Concentration Factor (BCF), the n-octanol-water Partition Coefficient (Kow), and the Lethal Dose-50 (LD50), using a dataset of 244 pesticides with over 160 features such as molecular weight, temperature, solubility, number of rings, and partition coefficient. A splitting of the dataset into 90 % training and 10 % testing sets. The RF + LGBM stacked model achieved the best performance for BCF prediction, with a coefficient of determination (R2) of 0.89 and a Mean Absolute Percentage Error (MAPE) of 12.72 %. Catboost excelled in predicting Kow with an R2 of 0.88, a Mean square error (MSE) of 0.364, and an MAPE of 22.38 %. For LD50, the RF + XGB stacked model was the most accurate, with an R2 of 0.75 and a MAPE of 8.5 %. Shapley Additive explanations (SHAP) analysis revealed that log P, water solubility, and SLogP were the most influential features across all models. This study demonstrates the power of machine learning for toxicity prediction while also setting the stage for future research in predictive toxicology, environmental monitoring, and sustainable pesticide regulation, ultimately contributing to more responsible and data-driven agricultural practices.
期刊介绍:
Pesticide Biochemistry and Physiology publishes original scientific articles pertaining to the mode of action of plant protection agents such as insecticides, fungicides, herbicides, and similar compounds, including nonlethal pest control agents, biosynthesis of pheromones, hormones, and plant resistance agents. Manuscripts may include a biochemical, physiological, or molecular study for an understanding of comparative toxicology or selective toxicity of both target and nontarget organisms. Particular interest will be given to studies on the molecular biology of pest control, toxicology, and pesticide resistance.
Research Areas Emphasized Include the Biochemistry and Physiology of:
• Comparative toxicity
• Mode of action
• Pathophysiology
• Plant growth regulators
• Resistance
• Other effects of pesticides on both parasites and hosts.