A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability

IF 4.7 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Irini Furxhi , Egon Willighagen , Chris Evelo , Anna Costa , Davide Gardini , Ammar Ammar
{"title":"A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability","authors":"Irini Furxhi ,&nbsp;Egon Willighagen ,&nbsp;Chris Evelo ,&nbsp;Anna Costa ,&nbsp;Davide Gardini ,&nbsp;Ammar Ammar","doi":"10.1016/j.impact.2023.100475","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>The current effort towards the digital transformation across multiple scientific domains requires data that is Findable, Accessible, Interoperable and Reusable (FAIR). In addition to the FAIR data, what is required for the application of computational tools, such as Quantitative Structure Activity Relationships (QSARs), is a sufficient data volume and the ability to merge sources into homogeneous digital assets. In the nanosafety domain there is a lack of FAIR available metadata.</p></div><div><h3>Methodology</h3><p>To address this challenge, we utilized 34 datasets from the nanosafety domain by exploiting the NanoSafety Data Reusability Assessment (NSDRA) framework, which allowed the annotation and assessment of dataset's reusability. From the framework's application results, eight datasets targeting the same endpoint (i.e. numerical cellular viability) were selected, processed and merged to test several hypothesis including universal versus nanogroup-specific QSAR models (metal oxide and nanotubes), and regression versus classification Machine Learning (ML) algorithms.</p></div><div><h3>Results</h3><p>Universal regression and classification QSARs reached an 0.86 R<sup>2</sup> and 0.92 accuracy, respectively, for the test set. Nanogroup-specific regression models reached 0.88 R<sup>2</sup> for nanotubes test set followed by metal oxide (0.78). Nanogroup-specific classification models reached 0.99 accuracy for nanotubes test set, followed by metal oxide (0.91). Feature importance revealed different patterns depending on the dataset with common influential features including core size, exposure conditions and toxicological assay.</p><p>Even in the case where the available experimental knowledge was merged, the models still failed to correctly predict the outputs of an unseen dataset, revealing the cumbersome conundrum of scientific reproducibility in realistic applications of QSAR for nanosafety. To harness the full potential of computational tools and ensure their long-term applications, embracing FAIR data practices is imperative in driving the development of responsible QSAR models.</p></div><div><h3>Conclusions</h3><p>This study reveals that the digitalization of nanosafety knowledge in a reproducible manner has a long way towards its successful pragmatic implementation. The workflow carried out in the study shows a promising approach to increase the FAIRness across all the elements of computational studies, from dataset's annotation, selection, merging to FAIR modeling reporting. This has significant implications for future research as it provides an example of how to utilize and report different tools available in the nanosafety knowledge system, while increasing the transparency of the results. One of the main benefits of this workflow is that it promotes data sharing and reuse, which is essential for advancing scientific knowledge by making data and metadata FAIR compliant. In addition, the increased transparency and reproducibility of the results can enhance the trustworthiness of the computational findings.</p></div>","PeriodicalId":18786,"journal":{"name":"NanoImpact","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NanoImpact","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452074823000265","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

The current effort towards the digital transformation across multiple scientific domains requires data that is Findable, Accessible, Interoperable and Reusable (FAIR). In addition to the FAIR data, what is required for the application of computational tools, such as Quantitative Structure Activity Relationships (QSARs), is a sufficient data volume and the ability to merge sources into homogeneous digital assets. In the nanosafety domain there is a lack of FAIR available metadata.

Methodology

To address this challenge, we utilized 34 datasets from the nanosafety domain by exploiting the NanoSafety Data Reusability Assessment (NSDRA) framework, which allowed the annotation and assessment of dataset's reusability. From the framework's application results, eight datasets targeting the same endpoint (i.e. numerical cellular viability) were selected, processed and merged to test several hypothesis including universal versus nanogroup-specific QSAR models (metal oxide and nanotubes), and regression versus classification Machine Learning (ML) algorithms.

Results

Universal regression and classification QSARs reached an 0.86 R2 and 0.92 accuracy, respectively, for the test set. Nanogroup-specific regression models reached 0.88 R2 for nanotubes test set followed by metal oxide (0.78). Nanogroup-specific classification models reached 0.99 accuracy for nanotubes test set, followed by metal oxide (0.91). Feature importance revealed different patterns depending on the dataset with common influential features including core size, exposure conditions and toxicological assay.

Even in the case where the available experimental knowledge was merged, the models still failed to correctly predict the outputs of an unseen dataset, revealing the cumbersome conundrum of scientific reproducibility in realistic applications of QSAR for nanosafety. To harness the full potential of computational tools and ensure their long-term applications, embracing FAIR data practices is imperative in driving the development of responsible QSAR models.

Conclusions

This study reveals that the digitalization of nanosafety knowledge in a reproducible manner has a long way towards its successful pragmatic implementation. The workflow carried out in the study shows a promising approach to increase the FAIRness across all the elements of computational studies, from dataset's annotation, selection, merging to FAIR modeling reporting. This has significant implications for future research as it provides an example of how to utilize and report different tools available in the nanosafety knowledge system, while increasing the transparency of the results. One of the main benefits of this workflow is that it promotes data sharing and reuse, which is essential for advancing scientific knowledge by making data and metadata FAIR compliant. In addition, the increased transparency and reproducibility of the results can enhance the trustworthiness of the computational findings.

基于NSDRA框架的纳米安全领域的数据可重用性评估,随后针对细胞活力进行探索性定量结构-活性关系(QSAR)建模。
引言:当前跨多个科学领域的数字化转型需要可查找、可访问、可互操作和可重复使用的数据(FAIR)。除了FAIR数据外,应用计算工具(如定量结构-活动关系(QSAR))所需的是足够的数据量和将来源合并为同质数字资产的能力。在纳米安全领域,缺乏可用的FAIR元数据。方法:为了应对这一挑战,我们利用纳米安全数据可重用性评估(NSDRA)框架,利用了来自纳米安全领域的34个数据集,该框架允许对数据集的可重用性进行注释和评估。从该框架的应用结果中,选择、处理和合并了八个针对相同终点(即数值细胞活力)的数据集,以测试几个假设,包括通用与纳米组特异性QSAR模型(金属氧化物和纳米管),以及回归与分类机器学习(ML)算法。结果:通用回归和分类QSAR的准确度分别达到0.86R2和0.92。纳米管测试集的纳米组特异性回归模型达到0.88 R2,其次是金属氧化物(0.78)。纳米管测试集和金属氧化物(0.91)的纳米组特异性分类模型达到0.99的准确度。特征重要性揭示了不同的模式,这取决于数据集,具有共同的影响特征,包括核心大小、暴露条件和毒理学分析。即使在现有实验知识被合并的情况下,模型仍然未能正确预测看不见的数据集的输出,这揭示了QSAR在纳米安全的实际应用中科学再现性的繁琐难题。为了充分利用计算工具的潜力并确保其长期应用,采用FAIR数据实践对于推动负责任的QSAR模型的开发至关重要。结论:本研究表明,以可复制的方式实现纳米安全知识的数字化离其成功的务实实施还有很长的路要走。该研究中执行的工作流程显示了一种很有前途的方法,可以在计算研究的所有元素中提高FAIR,从数据集的注释、选择、合并到FAIR建模报告。这对未来的研究具有重要意义,因为它提供了一个如何利用和报告纳米安全知识系统中可用的不同工具的例子,同时提高了结果的透明度。该工作流程的主要好处之一是促进了数据共享和重用,这对于通过使数据和元数据符合FAIR来推进科学知识至关重要。此外,结果的透明度和再现性的提高可以提高计算结果的可信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
NanoImpact
NanoImpact Social Sciences-Safety Research
CiteScore
11.00
自引率
6.10%
发文量
69
审稿时长
23 days
期刊介绍: NanoImpact is a multidisciplinary journal that focuses on nanosafety research and areas related to the impacts of manufactured nanomaterials on human and environmental systems and the behavior of nanomaterials in these systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信