新兴化学物质大鼠急性口服毒性预测的有效机器学习模型：多领域应用和构效关系。

IF 2.3 3区环境科学与生态学 Q3 CHEMISTRY, MULTIDISCIPLINARY

SAR and QSAR in Environmental Research Pub Date : 2025-06-01 Epub Date: 2025-07-31 DOI:10.1080/1062936X.2025.2531172

J Yan, Z Shen

{"title":"新兴化学物质大鼠急性口服毒性预测的有效机器学习模型：多领域应用和构效关系。","authors":"J Yan, Z Shen","doi":"10.1080/1062936X.2025.2531172","DOIUrl":null,"url":null,"abstract":"Given the widespread presence of emerging contaminants in the environment, assessing and ensuring their biosafety is urgent. Under the Globally Harmonized System (GHS), the LD50 parameter of acute oral toxicity (AOT) is crucial for chemical safety classification. Animal testing limitations have highlighted the need for alternative methods, and machine learning offers a new approach to predict LD50 through quantitative structure-activity relationship (QSAR) models. This study developed and optimized a machine learning model for LD50 classification of emerging contaminants based on data from more than 6000 known AOT. Using molecular descriptors and fingerprints, the model achieves an accuracy above 0.86 and a recall score over 0.84, outperforming previous models. The model's robustness was confirmed across various types of emerging contaminants. Shapley additive explanations (SHAP) identified key descriptors like BCUTp_1h, ATSC1pe, and SLogP_VSA4, while the information gain (IG) method highlighted alert substructures [P-O, P-S]. These findings suggest that compounds with high polarizability, mean electronegativity and significant surface area may adversely affect rats. This model enhances understanding of acute toxicity mechanisms and serves as a tool for early screening of safer compounds, promoting the design of greener chemicals.","PeriodicalId":21446,"journal":{"name":"SAR and QSAR in Environmental Research","volume":"36 6","pages":"537-554"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An effective machine learning model for rat acute oral toxicity prediction of emerging chemicals: multi-domain applications and structure-activity relationships.\",\"authors\":\"J Yan, Z Shen\",\"doi\":\"10.1080/1062936X.2025.2531172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the widespread presence of emerging contaminants in the environment, assessing and ensuring their biosafety is urgent. Under the Globally Harmonized System (GHS), the LD50 parameter of acute oral toxicity (AOT) is crucial for chemical safety classification. Animal testing limitations have highlighted the need for alternative methods, and machine learning offers a new approach to predict LD50 through quantitative structure-activity relationship (QSAR) models. This study developed and optimized a machine learning model for LD50 classification of emerging contaminants based on data from more than 6000 known AOT. Using molecular descriptors and fingerprints, the model achieves an accuracy above 0.86 and a recall score over 0.84, outperforming previous models. The model's robustness was confirmed across various types of emerging contaminants. Shapley additive explanations (SHAP) identified key descriptors like BCUTp_1h, ATSC1pe, and SLogP_VSA4, while the information gain (IG) method highlighted alert substructures [P-O, P-S]. These findings suggest that compounds with high polarizability, mean electronegativity and significant surface area may adversely affect rats. This model enhances understanding of acute toxicity mechanisms and serves as a tool for early screening of safer compounds, promoting the design of greener chemicals.\",\"PeriodicalId\":21446,\"journal\":{\"name\":\"SAR and QSAR in Environmental Research\",\"volume\":\"36 6\",\"pages\":\"537-554\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SAR and QSAR in Environmental Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1080/1062936X.2025.2531172\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAR and QSAR in Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/1062936X.2025.2531172","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

鉴于新出现的污染物在环境中广泛存在，评估和确保其生物安全性迫在眉睫。在全球统一制度（GHS）下，急性口服毒性（AOT）的LD50参数是化学品安全分类的关键参数。动物实验的局限性突出了对替代方法的需求，机器学习提供了一种通过定量结构-活性关系（QSAR）模型预测LD50的新方法。本研究基于6000多个已知AOT的数据，开发并优化了一个用于新兴污染物LD50分类的机器学习模型。使用分子描述符和指纹，该模型的准确率超过0.86，召回率超过0.84，优于之前的模型。该模型的稳健性在各种类型的新兴污染物中得到了证实。Shapley加性解释（SHAP）识别了关键描述符，如BCUTp_1h、ATSC1pe和SLogP_VSA4，而信息增益（IG）方法突出了警报子结构[P-O， P-S]。这些发现表明，具有高极化率、平均电负性和显著表面积的化合物可能对大鼠产生不利影响。该模型增强了对急性毒性机制的理解，并作为早期筛选更安全化合物的工具，促进了绿色化学品的设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An effective machine learning model for rat acute oral toxicity prediction of emerging chemicals: multi-domain applications and structure-activity relationships.

Given the widespread presence of emerging contaminants in the environment, assessing and ensuring their biosafety is urgent. Under the Globally Harmonized System (GHS), the LD₅₀ parameter of acute oral toxicity (AOT) is crucial for chemical safety classification. Animal testing limitations have highlighted the need for alternative methods, and machine learning offers a new approach to predict LD₅₀ through quantitative structure-activity relationship (QSAR) models. This study developed and optimized a machine learning model for LD₅₀ classification of emerging contaminants based on data from more than 6000 known AOT. Using molecular descriptors and fingerprints, the model achieves an accuracy above 0.86 and a recall score over 0.84, outperforming previous models. The model's robustness was confirmed across various types of emerging contaminants. Shapley additive explanations (SHAP) identified key descriptors like BCUTp_1h, ATSC1pe, and SLogP_VSA4, while the information gain (IG) method highlighted alert substructures [P-O, P-S]. These findings suggest that compounds with high polarizability, mean electronegativity and significant surface area may adversely affect rats. This model enhances understanding of acute toxicity mechanisms and serves as a tool for early screening of safer compounds, promoting the design of greener chemicals.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SAR and QSAR in Environmental Research 环境科学-毒理学

CiteScore

5.20

自引率

20.00%

发文量

审稿时长

>24 weeks

期刊介绍： SAR and QSAR in Environmental Research is an international journal welcoming papers on the fundamental and practical aspects of the structure-activity and structure-property relationships in the fields of environmental science, agrochemistry, toxicology, pharmacology and applied chemistry. A unique aspect of the journal is the focus on emerging techniques for the building of SAR and QSAR models in these widely varying fields. The scope of the journal includes, but is not limited to, the topics of topological and physicochemical descriptors, mathematical, statistical and graphical methods for data analysis, computer methods and programs, original applications and comparative studies. In addition to primary scientific papers, the journal contains reviews of books and software and news of conferences. Special issues on topics of current and widespread interest to the SAR and QSAR community will be published from time to time.