Explainable no-code OECD-compliant machine learning models to predict the mutagenic activity of polycyclic aromatic hydrocarbons and their radical cation metabolites
Andrés Halabi Diaz , Mario Duque-Noreña , Elizabeth Rincón , Eduardo Chamorro
{"title":"Explainable no-code OECD-compliant machine learning models to predict the mutagenic activity of polycyclic aromatic hydrocarbons and their radical cation metabolites","authors":"Andrés Halabi Diaz , Mario Duque-Noreña , Elizabeth Rincón , Eduardo Chamorro","doi":"10.1016/j.scitotenv.2025.179133","DOIUrl":null,"url":null,"abstract":"<div><div>Polycyclic aromatic hydrocarbons (PAHs) are <em>persistent pollutants</em> with well-known <em>genotoxic</em> and <em>mutagenic</em> effects, posing risks to ecosystems and human health. Their <em>hydrophobic nature</em> promotes accumulation in soils and aquatic environments, increasing exposure risks. Upon <em>metabolic activation</em>, PAHs generate reactive species that form <em>DNA adducts</em>, driving their mutagenic potential. This study presents an <em>OECD-compliant</em> methodology that integrates <em>conceptual density functional theory (CDFT)</em> calculations at the <em>GFN2-xTB</em> level with <em>machine learning</em> models to predict PAH mutagenicity. Using <em>quantum chemical descriptors</em> of <em>procarcinogens</em> and <em>radical cation metabolites</em> alongside <em>Ames test</em> data, key electronic properties linked to mutagenicity were identified. Feature selection consistently highlighted radical cation descriptors as key indicators of metabolic activation pathways. Machine learning models — including <em>SPAARC</em>, <em>Random Tree</em>, and <em>JCHAID</em> — achieved validation accuracies exceeding <em>89</em> <em>%</em>, with minimal <em>false-negative rates</em>, ensuring conservative predictions for environmental risk assessment. The <em>PSL</em> and <em>CDP</em> electrophilicity frameworks proved particularly effective in modeling DNA damage-related processes. This <em>no-code</em>, <em>freeware-based</em> methodology provides a scalable and cost-effective tool for assessing mutagenic risks in environmentally relevant conditions. The findings reinforce the importance of <em>metabolic activation</em>, validate the radical cation as a reliable proxy for this process, and demonstrate the predictive value of electronic properties in <em>QSAR</em> modeling. These insights support advances in <em>environmental toxicology</em> and contribute to improved strategies for <em>regulatory risk assessment</em>.</div></div>","PeriodicalId":422,"journal":{"name":"Science of the Total Environment","volume":"972 ","pages":"Article 179133"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of the Total Environment","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0048969725007685","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Polycyclic aromatic hydrocarbons (PAHs) are persistent pollutants with well-known genotoxic and mutagenic effects, posing risks to ecosystems and human health. Their hydrophobic nature promotes accumulation in soils and aquatic environments, increasing exposure risks. Upon metabolic activation, PAHs generate reactive species that form DNA adducts, driving their mutagenic potential. This study presents an OECD-compliant methodology that integrates conceptual density functional theory (CDFT) calculations at the GFN2-xTB level with machine learning models to predict PAH mutagenicity. Using quantum chemical descriptors of procarcinogens and radical cation metabolites alongside Ames test data, key electronic properties linked to mutagenicity were identified. Feature selection consistently highlighted radical cation descriptors as key indicators of metabolic activation pathways. Machine learning models — including SPAARC, Random Tree, and JCHAID — achieved validation accuracies exceeding 89%, with minimal false-negative rates, ensuring conservative predictions for environmental risk assessment. The PSL and CDP electrophilicity frameworks proved particularly effective in modeling DNA damage-related processes. This no-code, freeware-based methodology provides a scalable and cost-effective tool for assessing mutagenic risks in environmentally relevant conditions. The findings reinforce the importance of metabolic activation, validate the radical cation as a reliable proxy for this process, and demonstrate the predictive value of electronic properties in QSAR modeling. These insights support advances in environmental toxicology and contribute to improved strategies for regulatory risk assessment.
期刊介绍:
The Science of the Total Environment is an international journal dedicated to scientific research on the environment and its interaction with humanity. It covers a wide range of disciplines and seeks to publish innovative, hypothesis-driven, and impactful research that explores the entire environment, including the atmosphere, lithosphere, hydrosphere, biosphere, and anthroposphere.
The journal's updated Aims & Scope emphasizes the importance of interdisciplinary environmental research with broad impact. Priority is given to studies that advance fundamental understanding and explore the interconnectedness of multiple environmental spheres. Field studies are preferred, while laboratory experiments must demonstrate significant methodological advancements or mechanistic insights with direct relevance to the environment.