{"title":"环境相关化合物神经毒性预测的可解释机器学习模型的开发","authors":"Yuxing Hao, Zhihui Duan, Lizheng Liu, Qiao Xue, Wenxiao Pan, Xian Liu, Aiqian Zhang, Jianjie Fu","doi":"10.1021/acs.est.5c03311","DOIUrl":null,"url":null,"abstract":"The rising prevalence of nervous system disorders has become a significant global health challenge, with environmental pollutants identified as key contributors. However, the large number of environmental related compounds, combined with the low efficiency of traditional methods, has resulted in substantial gaps in neurotoxicity data. In this study, we developed a robust and interpretable neurotoxicity prediction model using a high-quality data set. To identify the best predictive model, three molecular representation methods (molecular fingerprints, molecular descriptors, and molecular graphs) combined with six traditional machine learning (ML) algorithms and two deep learning (DL) approaches were evaluated. The optimal model, combining molecular fingerprints and descriptors with eXtreme Gradient Boosting (XGBoost), achieved a training accuracy of 0.93 and an area under the curve (AUC) of 0.99, outperforming other ML and DL models, while maintaining interpretability. The model was used to screen 1170 compounds detected in human blood, predicting 1145 successfully. Among 89 compounds with known neurotoxicity data, the model achieved an accuracy of 0.74. It identified 821 potentially neurotoxic compounds, including 36 with high detection concentrations, warranting further study. An online platform (http://www.envwind.site/tools.html) was developed to expand accessibility. This model offers an efficient tool for predicting neurotoxicity and managing environmental health risks.","PeriodicalId":36,"journal":{"name":"环境科学与技术","volume":"34 1","pages":""},"PeriodicalIF":11.3000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of an Interpretable Machine Learning Model for Neurotoxicity Prediction of Environmentally Related Compounds\",\"authors\":\"Yuxing Hao, Zhihui Duan, Lizheng Liu, Qiao Xue, Wenxiao Pan, Xian Liu, Aiqian Zhang, Jianjie Fu\",\"doi\":\"10.1021/acs.est.5c03311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rising prevalence of nervous system disorders has become a significant global health challenge, with environmental pollutants identified as key contributors. However, the large number of environmental related compounds, combined with the low efficiency of traditional methods, has resulted in substantial gaps in neurotoxicity data. In this study, we developed a robust and interpretable neurotoxicity prediction model using a high-quality data set. To identify the best predictive model, three molecular representation methods (molecular fingerprints, molecular descriptors, and molecular graphs) combined with six traditional machine learning (ML) algorithms and two deep learning (DL) approaches were evaluated. The optimal model, combining molecular fingerprints and descriptors with eXtreme Gradient Boosting (XGBoost), achieved a training accuracy of 0.93 and an area under the curve (AUC) of 0.99, outperforming other ML and DL models, while maintaining interpretability. The model was used to screen 1170 compounds detected in human blood, predicting 1145 successfully. Among 89 compounds with known neurotoxicity data, the model achieved an accuracy of 0.74. It identified 821 potentially neurotoxic compounds, including 36 with high detection concentrations, warranting further study. An online platform (http://www.envwind.site/tools.html) was developed to expand accessibility. This model offers an efficient tool for predicting neurotoxicity and managing environmental health risks.\",\"PeriodicalId\":36,\"journal\":{\"name\":\"环境科学与技术\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":11.3000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"环境科学与技术\",\"FirstCategoryId\":\"1\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.est.5c03311\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"环境科学与技术","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.est.5c03311","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
Development of an Interpretable Machine Learning Model for Neurotoxicity Prediction of Environmentally Related Compounds
The rising prevalence of nervous system disorders has become a significant global health challenge, with environmental pollutants identified as key contributors. However, the large number of environmental related compounds, combined with the low efficiency of traditional methods, has resulted in substantial gaps in neurotoxicity data. In this study, we developed a robust and interpretable neurotoxicity prediction model using a high-quality data set. To identify the best predictive model, three molecular representation methods (molecular fingerprints, molecular descriptors, and molecular graphs) combined with six traditional machine learning (ML) algorithms and two deep learning (DL) approaches were evaluated. The optimal model, combining molecular fingerprints and descriptors with eXtreme Gradient Boosting (XGBoost), achieved a training accuracy of 0.93 and an area under the curve (AUC) of 0.99, outperforming other ML and DL models, while maintaining interpretability. The model was used to screen 1170 compounds detected in human blood, predicting 1145 successfully. Among 89 compounds with known neurotoxicity data, the model achieved an accuracy of 0.74. It identified 821 potentially neurotoxic compounds, including 36 with high detection concentrations, warranting further study. An online platform (http://www.envwind.site/tools.html) was developed to expand accessibility. This model offers an efficient tool for predicting neurotoxicity and managing environmental health risks.
期刊介绍:
Environmental Science & Technology (ES&T) is a co-sponsored academic and technical magazine by the Hubei Provincial Environmental Protection Bureau and the Hubei Provincial Academy of Environmental Sciences.
Environmental Science & Technology (ES&T) holds the status of Chinese core journals, scientific papers source journals of China, Chinese Science Citation Database source journals, and Chinese Academic Journal Comprehensive Evaluation Database source journals. This publication focuses on the academic field of environmental protection, featuring articles related to environmental protection and technical advancements.