{"title":"Establishment of an Integrated Model for Predicting Compound Mutagenicity with a Feature Importance Analysis.","authors":"Chao-Hsu Yang,Tony Eight Lin,Jui-Hua Hsieh,Kai-Cheng Hsu,Pei-Te Chiueh","doi":"10.1021/acs.jcim.5c01586","DOIUrl":null,"url":null,"abstract":"Assessing the mutagenicity of chemical compounds is crucial for ensuring their safety and minimizing potential environmental and public health risks. However, traditional mutagenicity assessments, such as the Ames test, are time-consuming, resource-intensive, and often limited in their capacity to screen a large number of compounds. To address this gap, predictive models powered by deep learning offer a promising alternative for rapid and cost-effective mutagenicity screening. In this study, we propose an integrated deep learning framework utilizing diverse molecular features to predict compound mutagenicity. In the total usage of 5866 compounds, 5279 compounds were utilized for model training, and the other 587 compounds were utilized for model evaluation. A total of 78 integrated models were developed by systematically combining 13 types of molecular descriptors and fingerprints. The MACCS-Mordred model demonstrated the best performance, achieving a balanced accuracy of 0.885 and a precision score of 0.922 in the testing data set. In addition, we performed an activity cliff analysis to examine potential sources of mispredictions. Applicability domain analysis further confirmed the robustness of the model, indicating that most compounds in our data set fell within the reliable prediction space. Notably, feature importance analysis revealed that mutagenic compounds are more likely to contain nitrogen-containing and ring-related substructures, offering insights into structural characteristics associated with mutagenic risk. Our results support AI-enabled screening tools for prioritizing hazardous compounds and improving early stage chemical risk assessment. This work provides practical value for environmental monitoring and regulatory decision-making.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"129 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01586","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Assessing the mutagenicity of chemical compounds is crucial for ensuring their safety and minimizing potential environmental and public health risks. However, traditional mutagenicity assessments, such as the Ames test, are time-consuming, resource-intensive, and often limited in their capacity to screen a large number of compounds. To address this gap, predictive models powered by deep learning offer a promising alternative for rapid and cost-effective mutagenicity screening. In this study, we propose an integrated deep learning framework utilizing diverse molecular features to predict compound mutagenicity. In the total usage of 5866 compounds, 5279 compounds were utilized for model training, and the other 587 compounds were utilized for model evaluation. A total of 78 integrated models were developed by systematically combining 13 types of molecular descriptors and fingerprints. The MACCS-Mordred model demonstrated the best performance, achieving a balanced accuracy of 0.885 and a precision score of 0.922 in the testing data set. In addition, we performed an activity cliff analysis to examine potential sources of mispredictions. Applicability domain analysis further confirmed the robustness of the model, indicating that most compounds in our data set fell within the reliable prediction space. Notably, feature importance analysis revealed that mutagenic compounds are more likely to contain nitrogen-containing and ring-related substructures, offering insights into structural characteristics associated with mutagenic risk. Our results support AI-enabled screening tools for prioritizing hazardous compounds and improving early stage chemical risk assessment. This work provides practical value for environmental monitoring and regulatory decision-making.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.