基于多任务深度学习和REACH数据的体外诱变性建模。

IF 3.8 3区医学 Q2 CHEMISTRY, MEDICINAL

Chemical Research in Toxicology Pub Date : 2025-07-18 DOI:10.1021/acs.chemrestox.5c00152

Panagiotis G. Karamertzanis*, Mike Rasenberg, Imran Shah and Grace Patlewicz,

{"title":"基于多任务深度学习和REACH数据的体外诱变性建模。","authors":"Panagiotis G. Karamertzanis*, Mike Rasenberg, Imran Shah and Grace Patlewicz, ","doi":"10.1021/acs.chemrestox.5c00152","DOIUrl":null,"url":null,"abstract":"Under REACH, mutagenicity assessment relies on in vitro testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by in vivo testing if necessary. This study explored the possibility of using the inherent correlation between these in vitro assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models’ generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73–84% for the four in vitro assays and exceeded classical machine learning by 2–8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82–85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72–78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":"38 8","pages":"1382–1407"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modelling In vitro Mutagenicity Using Multi-Task Deep Learning and REACH Data\",\"authors\":\"Panagiotis G. Karamertzanis*, Mike Rasenberg, Imran Shah and Grace Patlewicz, \",\"doi\":\"10.1021/acs.chemrestox.5c00152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Under REACH, mutagenicity assessment relies on in vitro testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by in vivo testing if necessary. This study explored the possibility of using the inherent correlation between these in vitro assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models’ generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73–84% for the four in vitro assays and exceeded classical machine learning by 2–8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82–85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72–78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.\",\"PeriodicalId\":31,\"journal\":{\"name\":\"Chemical Research in Toxicology\",\"volume\":\"38 8\",\"pages\":\"1382–1407\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Research in Toxicology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.chemrestox.5c00152\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.chemrestox.5c00152","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

根据REACH，致突变性评估依赖于体外测试（细菌和/或哺乳动物细胞的基因突变测试，以及哺乳动物细胞的染色体畸变或微核检测），然后在必要时进行体内测试。本研究探索了利用这些体外分析之间的内在相关性来创建多任务深度学习模型的可能性，并检查它们是否优于单任务模型。编制了一个包含超过12,000种物质的广泛遗传毒性数据集，包括算法整理的REACH数据和来自几个公共来源的信息。遗传毒性信息也从ToxValDB和文献来源中检索，以构建外部（保留）测试集，以严格评估模型的一般性能。研究了一系列单任务和多任务模型，从经典的机器学习技术和化学指纹到使用图表示分子结构的深度学习方法。最佳深度学习单任务模型在四种体外检测中实现了73-84%的交叉验证平衡精度，比经典机器学习高出2-8%。特定菌株和代谢激活模式的基因突变检测的平衡准确率为82-85%，提高幅度为7% - 12%。针对特定细菌菌株和代谢激活模式的多任务深度学习模型的交叉验证测试平衡准确性平均比单任务模型高8%，但当分析结果汇总时，两者相当。当至少有200个阳性和200个阴性时，特定菌株和代谢激活模式的最佳深度学习模型的外部平衡精度为72- 78%。图神经网络模型的降维分子嵌入能够区分阳性和阴性以及触发已知遗传毒性结构警报的聚类结构。这些模型还用于鉴定与预测细菌负遗传毒性和哺乳动物细胞正遗传毒性相关的结构片段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Modelling In vitro Mutagenicity Using Multi-Task Deep Learning and REACH Data

查看原文本刊更多论文

Modelling In vitro Mutagenicity Using Multi-Task Deep Learning and REACH Data

Under REACH, mutagenicity assessment relies on in vitro testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by in vivo testing if necessary. This study explored the possibility of using the inherent correlation between these in vitro assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models’ generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73–84% for the four in vitro assays and exceeded classical machine learning by 2–8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82–85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72–78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chemical Research in Toxicology 医学-毒理学

CiteScore

7.90

自引率

7.30%

发文量

215

审稿时长

3.5 months

期刊介绍： Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.