Haoran Wang , Junzhe Dai , Sikun Liu , Xiaohan Huang , Zhihao Xie , Zujian Wu , Tianchi Liang , Gang Lu
{"title":"基于含氮杂环结构-毒性依赖机制的卡马西平转化产物EC50毒性预测模型","authors":"Haoran Wang , Junzhe Dai , Sikun Liu , Xiaohan Huang , Zhihao Xie , Zujian Wu , Tianchi Liang , Gang Lu","doi":"10.1016/j.hazadv.2025.100882","DOIUrl":null,"url":null,"abstract":"<div><div>Carbamazepine (CBZ) and its transformation products (TPs) were frequently detected in aquatic environments, and their long-term presence was linked to microbial antibiotic resistance and global health risks. In this study, we developed high-precision regression and classification models to predict the toxicity of CBZ's TPs, using bioassay data from <em>Vibrio fischeri</em> EC<sub>50</sub> values. The models were trained on experimentally determined toxicity data of 38 nitrogen-containing heterocyclic compounds (NHCs) and validated using 11 CBZ's TPs as an external validation set. Eight machine learning models were used to train regression models, and six were used to train classification models. To address overfitting due to the limited dataset size, variational autoencoder (VAE) based data augmentation expanded the training dataset from 38 to 200 samples. Among the machine learning models trained, Support vector regression (SVR) and Gradient boosting machines (GBM) were identified as the optimal regression and classification models, respectively. Additionally, Shapley additive explanations (SHAP) analysis was employed to identify the key molecular features contributing to toxicity, highlighting the critical roles of heterocyclic structures, topological properties, and nitrogen atom characteristics of TPs in determining their toxicity. It proved NHCs similar structure-based nitrogen heterocyclic structure training model present robust. This work provided a reliable framework for assessing the toxicity of CBZ's TPs in environmental monitoring and ecotoxicity risk assessment.</div></div>","PeriodicalId":73763,"journal":{"name":"Journal of hazardous materials advances","volume":"20 ","pages":"Article 100882"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nitrogen-containing heterocyclic structure-toxicity reliant mechanism based EC50 toxicity prediction models for carbamazepine’s transformation products\",\"authors\":\"Haoran Wang , Junzhe Dai , Sikun Liu , Xiaohan Huang , Zhihao Xie , Zujian Wu , Tianchi Liang , Gang Lu\",\"doi\":\"10.1016/j.hazadv.2025.100882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Carbamazepine (CBZ) and its transformation products (TPs) were frequently detected in aquatic environments, and their long-term presence was linked to microbial antibiotic resistance and global health risks. In this study, we developed high-precision regression and classification models to predict the toxicity of CBZ's TPs, using bioassay data from <em>Vibrio fischeri</em> EC<sub>50</sub> values. The models were trained on experimentally determined toxicity data of 38 nitrogen-containing heterocyclic compounds (NHCs) and validated using 11 CBZ's TPs as an external validation set. Eight machine learning models were used to train regression models, and six were used to train classification models. To address overfitting due to the limited dataset size, variational autoencoder (VAE) based data augmentation expanded the training dataset from 38 to 200 samples. Among the machine learning models trained, Support vector regression (SVR) and Gradient boosting machines (GBM) were identified as the optimal regression and classification models, respectively. Additionally, Shapley additive explanations (SHAP) analysis was employed to identify the key molecular features contributing to toxicity, highlighting the critical roles of heterocyclic structures, topological properties, and nitrogen atom characteristics of TPs in determining their toxicity. It proved NHCs similar structure-based nitrogen heterocyclic structure training model present robust. This work provided a reliable framework for assessing the toxicity of CBZ's TPs in environmental monitoring and ecotoxicity risk assessment.</div></div>\",\"PeriodicalId\":73763,\"journal\":{\"name\":\"Journal of hazardous materials advances\",\"volume\":\"20 \",\"pages\":\"Article 100882\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of hazardous materials advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772416625002931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of hazardous materials advances","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772416625002931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
Nitrogen-containing heterocyclic structure-toxicity reliant mechanism based EC50 toxicity prediction models for carbamazepine’s transformation products
Carbamazepine (CBZ) and its transformation products (TPs) were frequently detected in aquatic environments, and their long-term presence was linked to microbial antibiotic resistance and global health risks. In this study, we developed high-precision regression and classification models to predict the toxicity of CBZ's TPs, using bioassay data from Vibrio fischeri EC50 values. The models were trained on experimentally determined toxicity data of 38 nitrogen-containing heterocyclic compounds (NHCs) and validated using 11 CBZ's TPs as an external validation set. Eight machine learning models were used to train regression models, and six were used to train classification models. To address overfitting due to the limited dataset size, variational autoencoder (VAE) based data augmentation expanded the training dataset from 38 to 200 samples. Among the machine learning models trained, Support vector regression (SVR) and Gradient boosting machines (GBM) were identified as the optimal regression and classification models, respectively. Additionally, Shapley additive explanations (SHAP) analysis was employed to identify the key molecular features contributing to toxicity, highlighting the critical roles of heterocyclic structures, topological properties, and nitrogen atom characteristics of TPs in determining their toxicity. It proved NHCs similar structure-based nitrogen heterocyclic structure training model present robust. This work provided a reliable framework for assessing the toxicity of CBZ's TPs in environmental monitoring and ecotoxicity risk assessment.