模拟识别方法(AIM)片段的CSRML版本的开发及其在泛化跨读(GenRA)方法中的评估

IF 3.1 Q2 TOXICOLOGY

Computational Toxicology Pub Date : 2023-02-01 DOI:10.1016/j.comtox.2022.100256

Matthew Adams , Hannah Hidle , Daniel Chang , Ann M. Richard , Antony J. Williams , Imran Shah , Grace Patlewicz

{"title":"模拟识别方法(AIM)片段的CSRML版本的开发及其在泛化跨读(GenRA)方法中的评估","authors":"Matthew Adams , Hannah Hidle , Daniel Chang , Ann M. Richard , Antony J. Williams , Imran Shah , Grace Patlewicz","doi":"10.1016/j.comtox.2022.100256","DOIUrl":null,"url":null,"abstract":"<div>The Analog Identification Methodology (AIM) was developed over 20 years ago to identify analogues to support read-across at the US Environmental Protection Agency. However, the current public version of the standalone tool, released in 2012, is no longer usable on Windows operating systems supported by Microsoft. Additionally, the structural logic for analogue selection is based on older, customised Simplified molecular-input-line-entry system (SMILES)-type features that are incompatible with modern cheminformatics tools. Given these limitations, a case study was undertaken to explore a more transparent, extensible method of implementing the AIM fragments using Chemical Subgraphs and Reactions Mark-up Language (CSRML). A CSRML file was developed to codify the original AIM fragments, and the extent to which AIM fragments were faithfully replicated was assessed using the AIM Database. The overall mean performance of the CSRML-AIM across all fragments in terms of sensitivity, specificity, and Jaccard similarity was 89.5%, 99.9%, and 82.2%, respectively. Comparing the AIM fragments with public ToxPrints using a large set of ∼25,000 substances of regulatory interest to EPA found them to be dissimilar, with an average maximum Jaccard score of 0.24 for AIM and 0.29 for ToxPrint fingerprints. Both fragment sets were then used as inputs in the automated read-across approach, Generalised Read-Across (GenRA), to evaluate the quality of fit in predicting rat acute oral toxicity LD50 values with the coefficient of determination (R2) and root mean squared error (RMSE). The performance of AIM fragments was R2=0.434 and RMSE=0.663 whereas that of ToxPrints was R2=0.477 and RMSE=0.638. A bootstrap resampling using 100 iterations found the mean and the 95th confidence interval of R2 to be 0.349 [0.319, 0.379] for AIM fragments and 0.377 [0.338, 0.412] for ToxPrints. Although AIM and ToxPrints performed similarly in predicting LD50, they differed in their performance at a local level, revealing that their features can offer complementary insights.</div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888031/pdf/","citationCount":"1","resultStr":"{\"title\":\"Development of a CSRML version of the Analog identification Methodology (AIM) fragments and their evaluation within the Generalised Read-Across (GenRA) approach\",\"authors\":\"Matthew Adams , Hannah Hidle , Daniel Chang , Ann M. Richard , Antony J. Williams , Imran Shah , Grace Patlewicz\",\"doi\":\"10.1016/j.comtox.2022.100256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>The Analog Identification Methodology (AIM) was developed over 20 years ago to identify analogues to support read-across at the US Environmental Protection Agency. However, the current public version of the standalone tool, released in 2012, is no longer usable on Windows operating systems supported by Microsoft. Additionally, the structural logic for analogue selection is based on older, customised Simplified molecular-input-line-entry system (SMILES)-type features that are incompatible with modern cheminformatics tools. Given these limitations, a case study was undertaken to explore a more transparent, extensible method of implementing the AIM fragments using Chemical Subgraphs and Reactions Mark-up Language (CSRML). A CSRML file was developed to codify the original AIM fragments, and the extent to which AIM fragments were faithfully replicated was assessed using the AIM Database. The overall mean performance of the CSRML-AIM across all fragments in terms of sensitivity, specificity, and Jaccard similarity was 89.5%, 99.9%, and 82.2%, respectively. Comparing the AIM fragments with public ToxPrints using a large set of ∼25,000 substances of regulatory interest to EPA found them to be dissimilar, with an average maximum Jaccard score of 0.24 for AIM and 0.29 for ToxPrint fingerprints. Both fragment sets were then used as inputs in the automated read-across approach, Generalised Read-Across (GenRA), to evaluate the quality of fit in predicting rat acute oral toxicity LD50 values with the coefficient of determination (R2) and root mean squared error (RMSE). The performance of AIM fragments was R2=0.434 and RMSE=0.663 whereas that of ToxPrints was R2=0.477 and RMSE=0.638. A bootstrap resampling using 100 iterations found the mean and the 95th confidence interval of R2 to be 0.349 [0.319, 0.379] for AIM fragments and 0.377 [0.338, 0.412] for ToxPrints. Although AIM and ToxPrints performed similarly in predicting LD50, they differed in their performance at a local level, revealing that their features can offer complementary insights.</div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888031/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111322000445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111322000445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 1

摘要

模拟物识别方法(AIM)是在20多年前开发的，用于识别类似物，以支持美国环境保护署的读取。但是，该独立工具的当前公开版本(2012年发布)已无法在微软支持的Windows操作系统上使用。此外，模拟物选择的结构逻辑是基于旧的，定制的简化分子输入行输入系统(SMILES)类型的特征，与现代化学信息学工具不兼容。考虑到这些限制，我们进行了一个案例研究，探索一种使用化学子图和反应标记语言(CSRML)实现AIM片段的更透明、可扩展的方法。开发了一个CSRML文件来对原始AIM片段进行编码，并使用AIM数据库评估AIM片段被忠实复制的程度。CSRML-AIM在所有片段的敏感性、特异性和Jaccard相似性方面的总体平均表现分别为89.5%、99.9%和82.2%。将AIM片段与公共ToxPrints进行比较，使用大量的约25,000种对EPA具有监管意义的物质，发现它们是不同的，AIM和ToxPrint指纹的平均最大Jaccard分数分别为0.24和0.29。然后将这两个片段集用作自动读取方法的输入，即广义读取(GenRA)，以确定系数(R2)和均方根误差(RMSE)评估预测大鼠急性口服毒性LD50值的拟合质量。AIM片段的检测效能R2=0.434, RMSE=0.663, ToxPrints的检测效能R2=0.477, RMSE=0.638。使用100次迭代的bootstrap重采样发现，AIM片段的R2均值和第95可信区间为0.349 [0.319,0.379]，ToxPrints的R2均值和可信区间为0.377[0.338,0.412]。尽管AIM和ToxPrints在预测LD50方面表现相似，但它们在局部水平上的表现不同，这表明它们的特征可以提供互补的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Development of a CSRML version of the Analog identification Methodology (AIM) fragments and their evaluation within the Generalised Read-Across (GenRA) approach

The Analog Identification Methodology (AIM) was developed over 20 years ago to identify analogues to support read-across at the US Environmental Protection Agency. However, the current public version of the standalone tool, released in 2012, is no longer usable on Windows operating systems supported by Microsoft. Additionally, the structural logic for analogue selection is based on older, customised Simplified molecular-input-line-entry system (SMILES)-type features that are incompatible with modern cheminformatics tools. Given these limitations, a case study was undertaken to explore a more transparent, extensible method of implementing the AIM fragments using Chemical Subgraphs and Reactions Mark-up Language (CSRML). A CSRML file was developed to codify the original AIM fragments, and the extent to which AIM fragments were faithfully replicated was assessed using the AIM Database. The overall mean performance of the CSRML-AIM across all fragments in terms of sensitivity, specificity, and Jaccard similarity was 89.5%, 99.9%, and 82.2%, respectively. Comparing the AIM fragments with public ToxPrints using a large set of ∼25,000 substances of regulatory interest to EPA found them to be dissimilar, with an average maximum Jaccard score of 0.24 for AIM and 0.29 for ToxPrint fingerprints. Both fragment sets were then used as inputs in the automated read-across approach, Generalised Read-Across (GenRA), to evaluate the quality of fit in predicting rat acute oral toxicity LD₅₀ values with the coefficient of determination (R²) and root mean squared error (RMSE). The performance of AIM fragments was R²=0.434 and RMSE=0.663 whereas that of ToxPrints was R²=0.477 and RMSE=0.638. A bootstrap resampling using 100 iterations found the mean and the 95th confidence interval of R² to be 0.349 [0.319, 0.379] for AIM fragments and 0.377 [0.338, 0.412] for ToxPrints. Although AIM and ToxPrints performed similarly in predicting LD_50, they differed in their performance at a local level, revealing that their features can offer complementary insights.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Toxicology Computer Science-Computer Science Applications

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

56 days

期刊介绍： Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs