作为跨读方法的一部分，图形相似性度量是否有助于模拟识别？

IF 3.1 Q2 TOXICOLOGY

Computational Toxicology Pub Date : 2025-05-08 DOI:10.1016/j.comtox.2025.100353

Brett Hagan , Imran Shah , Grace Patlewicz

{"title":"作为跨读方法的一部分，图形相似性度量是否有助于模拟识别？","authors":"Brett Hagan , Imran Shah , Grace Patlewicz","doi":"10.1016/j.comtox.2025.100353","DOIUrl":null,"url":null,"abstract":"<div><div>Read-across is a technique used to fill data gaps for substances lacking specific hazard data. The technique relies on identifying source analogues with relevant data that are ‘similar’ to the substance of interest (target). Typically, source analogues are identified on the basis of structural similarity but the evaluation of their suitability for read-across depends on other contexts of similarity. This manuscript aimed to review the ways in which source analogues are identified for read-across using chemical fingerprint/scaffold approaches before describing graph-based approaches including; graph kernel, graph embedding, and deep learning. To demonstrate how these could be practically used for analogue identification, five different toxicity datasets of varying size and diversity were selected that had been the subject of previous read-across or QSAR analyses. One dataset was an analogue set whereas the other four datasets comprised substances evaluated for their skin sensitisation, skin irritation, fathead minnow aquatic toxicity and genotoxicity potential. The analogues and their associated similarities using the different graph based approaches were compared with the outcomes from two chemical fingerprint approaches (ToxPrints and Morgan). The results for each dataset are briefly described. Based on the examples evaluated, graph kernel approaches were found to have some promise, in contrast unsupervised whole graph embedding approaches were ineffective for all the datasets evaluated. Graph convolutional networks produced meaningful embeddings for the genotoxicity dataset evaluated. Depending on use case, availability and size of training data, graph similarity approaches have the potential to play a larger role in analogue identification and evaluation for read-across.</div></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"34 ","pages":"Article 100353"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can graph similarity metrics be helpful for analogue identification as part of a read-across approach?\",\"authors\":\"Brett Hagan , Imran Shah , Grace Patlewicz\",\"doi\":\"10.1016/j.comtox.2025.100353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Read-across is a technique used to fill data gaps for substances lacking specific hazard data. The technique relies on identifying source analogues with relevant data that are ‘similar’ to the substance of interest (target). Typically, source analogues are identified on the basis of structural similarity but the evaluation of their suitability for read-across depends on other contexts of similarity. This manuscript aimed to review the ways in which source analogues are identified for read-across using chemical fingerprint/scaffold approaches before describing graph-based approaches including; graph kernel, graph embedding, and deep learning. To demonstrate how these could be practically used for analogue identification, five different toxicity datasets of varying size and diversity were selected that had been the subject of previous read-across or QSAR analyses. One dataset was an analogue set whereas the other four datasets comprised substances evaluated for their skin sensitisation, skin irritation, fathead minnow aquatic toxicity and genotoxicity potential. The analogues and their associated similarities using the different graph based approaches were compared with the outcomes from two chemical fingerprint approaches (ToxPrints and Morgan). The results for each dataset are briefly described. Based on the examples evaluated, graph kernel approaches were found to have some promise, in contrast unsupervised whole graph embedding approaches were ineffective for all the datasets evaluated. Graph convolutional networks produced meaningful embeddings for the genotoxicity dataset evaluated. Depending on use case, availability and size of training data, graph similarity approaches have the potential to play a larger role in analogue identification and evaluation for read-across.</div></div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":\"34 \",\"pages\":\"Article 100353\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111325000131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111325000131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

跨读是一种技术，用于填补缺乏具体危害数据的物质的数据空白。该技术依赖于识别具有与感兴趣的物质（目标）“相似”的相关数据的源类似物。通常，源相似物是基于结构相似性来识别的，但对其是否适合跨读的评估取决于其他相似上下文。本文旨在回顾在描述基于图的方法之前，使用化学指纹/支架方法识别源类似物的方法，包括；图核，图嵌入，深度学习。为了演示这些如何实际用于类似物鉴定，选择了五个不同大小和多样性的不同毒性数据集，这些数据集已成为先前读取或QSAR分析的主题。一个数据集是模拟集，而其他四个数据集包括评估其皮肤致敏，皮肤刺激，黑头鲦鱼水生毒性和遗传毒性潜力的物质。使用不同的基于图的方法得到的相似物及其相关的相似性与两种化学指纹方法（ToxPrints和Morgan）的结果进行了比较。简要描述了每个数据集的结果。基于评估的示例，发现图核方法有一定的前景，相比之下，无监督全图嵌入方法对所有评估的数据集都无效。图卷积网络为评估的遗传毒性数据集产生了有意义的嵌入。根据用例、可用性和训练数据的大小，图相似方法有可能在跨读的模拟识别和评估中发挥更大的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can graph similarity metrics be helpful for analogue identification as part of a read-across approach?

Read-across is a technique used to fill data gaps for substances lacking specific hazard data. The technique relies on identifying source analogues with relevant data that are ‘similar’ to the substance of interest (target). Typically, source analogues are identified on the basis of structural similarity but the evaluation of their suitability for read-across depends on other contexts of similarity. This manuscript aimed to review the ways in which source analogues are identified for read-across using chemical fingerprint/scaffold approaches before describing graph-based approaches including; graph kernel, graph embedding, and deep learning. To demonstrate how these could be practically used for analogue identification, five different toxicity datasets of varying size and diversity were selected that had been the subject of previous read-across or QSAR analyses. One dataset was an analogue set whereas the other four datasets comprised substances evaluated for their skin sensitisation, skin irritation, fathead minnow aquatic toxicity and genotoxicity potential. The analogues and their associated similarities using the different graph based approaches were compared with the outcomes from two chemical fingerprint approaches (ToxPrints and Morgan). The results for each dataset are briefly described. Based on the examples evaluated, graph kernel approaches were found to have some promise, in contrast unsupervised whole graph embedding approaches were ineffective for all the datasets evaluated. Graph convolutional networks produced meaningful embeddings for the genotoxicity dataset evaluated. Depending on use case, availability and size of training data, graph similarity approaches have the potential to play a larger role in analogue identification and evaluation for read-across.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Toxicology Computer Science-Computer Science Applications

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

56 days

期刊介绍： Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs