Establishment of an Integrated Model for Predicting Compound Mutagenicity with a Feature Importance Analysis.

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Chao-Hsu Yang,Tony Eight Lin,Jui-Hua Hsieh,Kai-Cheng Hsu,Pei-Te Chiueh
{"title":"Establishment of an Integrated Model for Predicting Compound Mutagenicity with a Feature Importance Analysis.","authors":"Chao-Hsu Yang,Tony Eight Lin,Jui-Hua Hsieh,Kai-Cheng Hsu,Pei-Te Chiueh","doi":"10.1021/acs.jcim.5c01586","DOIUrl":null,"url":null,"abstract":"Assessing the mutagenicity of chemical compounds is crucial for ensuring their safety and minimizing potential environmental and public health risks. However, traditional mutagenicity assessments, such as the Ames test, are time-consuming, resource-intensive, and often limited in their capacity to screen a large number of compounds. To address this gap, predictive models powered by deep learning offer a promising alternative for rapid and cost-effective mutagenicity screening. In this study, we propose an integrated deep learning framework utilizing diverse molecular features to predict compound mutagenicity. In the total usage of 5866 compounds, 5279 compounds were utilized for model training, and the other 587 compounds were utilized for model evaluation. A total of 78 integrated models were developed by systematically combining 13 types of molecular descriptors and fingerprints. The MACCS-Mordred model demonstrated the best performance, achieving a balanced accuracy of 0.885 and a precision score of 0.922 in the testing data set. In addition, we performed an activity cliff analysis to examine potential sources of mispredictions. Applicability domain analysis further confirmed the robustness of the model, indicating that most compounds in our data set fell within the reliable prediction space. Notably, feature importance analysis revealed that mutagenic compounds are more likely to contain nitrogen-containing and ring-related substructures, offering insights into structural characteristics associated with mutagenic risk. Our results support AI-enabled screening tools for prioritizing hazardous compounds and improving early stage chemical risk assessment. This work provides practical value for environmental monitoring and regulatory decision-making.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"129 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01586","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Assessing the mutagenicity of chemical compounds is crucial for ensuring their safety and minimizing potential environmental and public health risks. However, traditional mutagenicity assessments, such as the Ames test, are time-consuming, resource-intensive, and often limited in their capacity to screen a large number of compounds. To address this gap, predictive models powered by deep learning offer a promising alternative for rapid and cost-effective mutagenicity screening. In this study, we propose an integrated deep learning framework utilizing diverse molecular features to predict compound mutagenicity. In the total usage of 5866 compounds, 5279 compounds were utilized for model training, and the other 587 compounds were utilized for model evaluation. A total of 78 integrated models were developed by systematically combining 13 types of molecular descriptors and fingerprints. The MACCS-Mordred model demonstrated the best performance, achieving a balanced accuracy of 0.885 and a precision score of 0.922 in the testing data set. In addition, we performed an activity cliff analysis to examine potential sources of mispredictions. Applicability domain analysis further confirmed the robustness of the model, indicating that most compounds in our data set fell within the reliable prediction space. Notably, feature importance analysis revealed that mutagenic compounds are more likely to contain nitrogen-containing and ring-related substructures, offering insights into structural characteristics associated with mutagenic risk. Our results support AI-enabled screening tools for prioritizing hazardous compounds and improving early stage chemical risk assessment. This work provides practical value for environmental monitoring and regulatory decision-making.
基于特征重要性分析的化合物致突变性综合预测模型的建立。
评估化合物的致突变性对于确保其安全性和尽量减少潜在的环境和公共健康风险至关重要。然而,传统的致突变性评估,如Ames测试,耗时、资源密集,而且往往在筛选大量化合物的能力方面受到限制。为了解决这一差距,深度学习驱动的预测模型为快速和经济高效的致突变性筛查提供了一个有希望的替代方案。在这项研究中,我们提出了一个综合的深度学习框架,利用不同的分子特征来预测化合物的突变性。在5866种化合物中,5279种化合物用于模型训练,587种化合物用于模型评价。系统结合13种分子描述符和指纹图谱,构建了78个集成模型。MACCS-Mordred模型表现最好,在测试数据集中的平衡精度为0.885,精度分数为0.922。此外,我们执行了一个活动悬崖分析,以检查错误预测的潜在来源。适用性域分析进一步证实了模型的稳健性,表明我们数据集中的大多数化合物都在可靠的预测空间内。值得注意的是,特征重要性分析显示,致突变性化合物更可能含有含氮和环相关的亚结构,从而深入了解与致突变性风险相关的结构特征。我们的研究结果支持人工智能筛选工具,以确定有害化合物的优先顺序,并改善早期化学风险评估。这项工作对环境监测和监管决策具有实用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信