From literature to predictive modeling: Insights and machine learning applications from in vitro comet assays related to the genotoxicity of titanium dioxide nanomaterials
Irini Furxhi , Mahsa Mirzaei , Anna Costa , Rossella Bengalli
{"title":"From literature to predictive modeling: Insights and machine learning applications from in vitro comet assays related to the genotoxicity of titanium dioxide nanomaterials","authors":"Irini Furxhi , Mahsa Mirzaei , Anna Costa , Rossella Bengalli","doi":"10.1016/j.impact.2025.100562","DOIUrl":null,"url":null,"abstract":"<div><div>The genotoxicity of titanium dioxide nanomaterials (TiO<sub>2</sub> NMs) remains a debated topic in the scientific community. In this study, we applied the read-across concept based on machine learning (ML) algorithms to predict the genotoxic potential of TiO₂ NMs. Key objectives included: (i) compiling a systematic dataset capturing DNA damage percentage from in vitro comet assays, (ii) creating a homogenized dataset integrating physicochemical properties, exposure conditions, and experimental details, (iii) training ML models for prediction, (iv) evaluating model performance, and (v) identifying the features that contribute the most to predictive accuracy. The dataset was divided into three parts: the Entire dataset (all features), the Physicochemical dataset, and the Experimental design dataset. Extra Trees Regressor and XGB Regressor demonstrated high predictive accuracy, achieving R<sup>2</sup> values of 0.906 and 0.788 for the P-chem and Experimental dataset, respectively. Exposure concentration, cold lysis conditions, and electrophoresis parameters emerged as key predictors of DNA damage, alongside contributions from NM properties. These findings highlight the intricate interplay between NM properties and experimental conditions in genotoxicity assessments. By providing a FAIR dataset, this study facilitates future research, allowing for the integration of additional variables and quality criteria to enhance the modeling approach. This work reinforces the value of nano-informatics in nanosafety and serves as a footing for advancing data-driven hazard assessment methodologies, positioning ML-enabled read-across strategies as a valuable tool for regulatory nanosafety framework.</div></div>","PeriodicalId":18786,"journal":{"name":"NanoImpact","volume":"38 ","pages":"Article 100562"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NanoImpact","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452074825000229","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The genotoxicity of titanium dioxide nanomaterials (TiO2 NMs) remains a debated topic in the scientific community. In this study, we applied the read-across concept based on machine learning (ML) algorithms to predict the genotoxic potential of TiO₂ NMs. Key objectives included: (i) compiling a systematic dataset capturing DNA damage percentage from in vitro comet assays, (ii) creating a homogenized dataset integrating physicochemical properties, exposure conditions, and experimental details, (iii) training ML models for prediction, (iv) evaluating model performance, and (v) identifying the features that contribute the most to predictive accuracy. The dataset was divided into three parts: the Entire dataset (all features), the Physicochemical dataset, and the Experimental design dataset. Extra Trees Regressor and XGB Regressor demonstrated high predictive accuracy, achieving R2 values of 0.906 and 0.788 for the P-chem and Experimental dataset, respectively. Exposure concentration, cold lysis conditions, and electrophoresis parameters emerged as key predictors of DNA damage, alongside contributions from NM properties. These findings highlight the intricate interplay between NM properties and experimental conditions in genotoxicity assessments. By providing a FAIR dataset, this study facilitates future research, allowing for the integration of additional variables and quality criteria to enhance the modeling approach. This work reinforces the value of nano-informatics in nanosafety and serves as a footing for advancing data-driven hazard assessment methodologies, positioning ML-enabled read-across strategies as a valuable tool for regulatory nanosafety framework.
二氧化钛纳米材料(TiO2 NMs)的遗传毒性一直是科学界争论的话题。在这项研究中,我们应用基于机器学习(ML)算法的跨读概念来预测TiO₂NMs的遗传毒性潜力。主要目标包括:(i)编制一个系统的数据集,从体外彗星分析中捕获DNA损伤百分比,(ii)创建一个整合物理化学性质,暴露条件和实验细节的均质数据集,(iii)训练ML模型进行预测,(iv)评估模型性能,以及(v)确定对预测准确性贡献最大的特征。数据集分为三个部分:整个数据集(所有特征)、物理化学数据集和实验设计数据集。Extra Trees Regressor和XGB Regressor具有较高的预测精度,P-chem和Experimental数据集的R2分别为0.906和0.788。暴露浓度、冷裂解条件和电泳参数成为DNA损伤的关键预测因素,同时NM性质也有贡献。这些发现突出了遗传毒性评估中NM性质和实验条件之间复杂的相互作用。通过提供FAIR数据集,本研究促进了未来的研究,允许整合其他变量和质量标准以增强建模方法。这项工作加强了纳米信息学在纳米安全中的价值,并为推进数据驱动的危害评估方法奠定了基础,将ml支持的读取策略定位为监管纳米安全框架的宝贵工具。
期刊介绍:
NanoImpact is a multidisciplinary journal that focuses on nanosafety research and areas related to the impacts of manufactured nanomaterials on human and environmental systems and the behavior of nanomaterials in these systems.