Modelling In vitro Mutagenicity Using Multi-Task Deep Learning and REACH Data.

IF 3.8 3区 医学 Q2 CHEMISTRY, MEDICINAL
Panagiotis G Karamertzanis, Mike Rasenberg, Imran Shah, Grace Patlewicz
{"title":"Modelling <i>In vitro</i> Mutagenicity Using Multi-Task Deep Learning and REACH Data.","authors":"Panagiotis G Karamertzanis, Mike Rasenberg, Imran Shah, Grace Patlewicz","doi":"10.1021/acs.chemrestox.5c00152","DOIUrl":null,"url":null,"abstract":"<p><p>Under REACH, mutagenicity assessment relies on <i>in vitro</i> testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by <i>in vivo</i> testing if necessary. This study explored the possibility of using the inherent correlation between these <i>in vitro</i> assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models' generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73-84% for the four <i>in vitro</i> assays and exceeded classical machine learning by 2-8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82-85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72-78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.</p>","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.chemrestox.5c00152","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Under REACH, mutagenicity assessment relies on in vitro testing (gene mutation test in bacteria and/or mammalian cells, as well as chromosomal aberration or micronucleus assays in mammalian cells) followed by in vivo testing if necessary. This study explored the possibility of using the inherent correlation between these in vitro assays to create multi-task deep learning models and examine if they outperform single-task models. An extensive genotoxicity dataset with over 12,000 substances was compiled, including algorithmically curated REACH data and information from several public sources. Genotoxicity information was also retrieved from ToxValDB and literature sources to construct external (hold-out) test sets for a stringent assessment of the models' generalized performance. A range of single-task and multi-task models were investigated from classical machine learning techniques and chemical fingerprints to deep learning methods using graphs for molecular structure representation. The best deep learning single-task model achieved a cross-validation balanced accuracy of 73-84% for the four in vitro assays and exceeded classical machine learning by 2-8%. Gene mutation detection for specific bacterial strains and metabolic activation modes exhibited balanced accuracy 82-85%, with improvements ranging from 7% to 12%. Multi-task deep learning models for specific bacterial strains and metabolic activation modes had on average 8% higher cross-validation test balanced accuracy than single-task models but were comparable when assay outcomes were aggregated. The best deep learning models for specific bacterial strains and metabolic activation modes showed external balanced accuracy of 72-78 % when there were at least 200 positives and 200 negatives. The dimensionality-reduced molecular embeddings from graph neural network models were able to distinguish positives from negatives and cluster structures that trigger known genotoxicity structural alerts. The models were also used to identify structural moieties linked to predicted negative genotoxicity in bacteria and positive genotoxicity in mammalian cells.

基于多任务深度学习和REACH数据的体外诱变性建模。
根据REACH,致突变性评估依赖于体外测试(细菌和/或哺乳动物细胞的基因突变测试,以及哺乳动物细胞的染色体畸变或微核检测),然后在必要时进行体内测试。本研究探索了利用这些体外分析之间的内在相关性来创建多任务深度学习模型的可能性,并检查它们是否优于单任务模型。编制了一个包含超过12,000种物质的广泛遗传毒性数据集,包括算法整理的REACH数据和来自几个公共来源的信息。遗传毒性信息也从ToxValDB和文献来源中检索,以构建外部(保留)测试集,以严格评估模型的一般性能。研究了一系列单任务和多任务模型,从经典的机器学习技术和化学指纹到使用图表示分子结构的深度学习方法。最佳深度学习单任务模型在四种体外检测中实现了73-84%的交叉验证平衡精度,比经典机器学习高出2-8%。特定菌株和代谢激活模式的基因突变检测的平衡准确率为82-85%,提高幅度为7% - 12%。针对特定细菌菌株和代谢激活模式的多任务深度学习模型的交叉验证测试平衡准确性平均比单任务模型高8%,但当分析结果汇总时,两者相当。当至少有200个阳性和200个阴性时,特定菌株和代谢激活模式的最佳深度学习模型的外部平衡精度为72- 78%。图神经网络模型的降维分子嵌入能够区分阳性和阴性以及触发已知遗传毒性结构警报的聚类结构。这些模型还用于鉴定与预测细菌负遗传毒性和哺乳动物细胞正遗传毒性相关的结构片段。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.90
自引率
7.30%
发文量
215
审稿时长
3.5 months
期刊介绍: Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信