Knowledge graph embedding for experimental uncertainty estimation

IF 2.6 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

Information Discovery and Delivery Pub Date : 2023-02-08 DOI:10.1108/idd-06-2022-0060

Edoardo Ramalli, B. Pernici

{"title":"Knowledge graph embedding for experimental uncertainty estimation","authors":"Edoardo Ramalli, B. Pernici","doi":"10.1108/idd-06-2022-0060","DOIUrl":null,"url":null,"abstract":"\nPurpose\nExperiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model performance. Uncertainty inherently affects experiment measurements and is often missing in the available data sets due to its estimation cost. For similar reasons, experiments are very few compared to other data sources. Discarding experiments based on the missing uncertainty values would preclude the development of predictive models. Data profiling techniques are fundamental to assess data quality, but some data quality dimensions are challenging to evaluate without knowing the uncertainty. In this context, this paper aims to predict the missing uncertainty of the experiments.\n\n\nDesign/methodology/approach\nThis work presents a methodology to forecast the experiments’ missing uncertainty, given a data set and its ontological description. The approach is based on knowledge graph embeddings and leverages the task of link prediction over a knowledge graph representation of the experiments database. The validity of the methodology is first tested in multiple conditions using synthetic data and then applied to a large data set of experiments in the chemical kinetic domain as a case study.\n\n\nFindings\nThe analysis results of different test case scenarios suggest that knowledge graph embedding can be used to predict the missing uncertainty of the experiments when there is a hidden relationship between the experiment metadata and the uncertainty values. The link prediction task is also resilient to random noise in the relationship. The knowledge graph embedding outperforms the baseline results if the uncertainty depends upon multiple metadata.\n\n\nOriginality/value\nThe employment of knowledge graph embedding to predict the missing experimental uncertainty is a novel alternative to the current and more costly techniques in the literature. Such contribution permits a better data quality profiling of scientific repositories and improves the development process of data-driven models based on scientific experiments.\n","PeriodicalId":43488,"journal":{"name":"Information Discovery and Delivery","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Discovery and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/idd-06-2022-0060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 1

Abstract

Purpose Experiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model performance. Uncertainty inherently affects experiment measurements and is often missing in the available data sets due to its estimation cost. For similar reasons, experiments are very few compared to other data sources. Discarding experiments based on the missing uncertainty values would preclude the development of predictive models. Data profiling techniques are fundamental to assess data quality, but some data quality dimensions are challenging to evaluate without knowing the uncertainty. In this context, this paper aims to predict the missing uncertainty of the experiments. Design/methodology/approach This work presents a methodology to forecast the experiments’ missing uncertainty, given a data set and its ontological description. The approach is based on knowledge graph embeddings and leverages the task of link prediction over a knowledge graph representation of the experiments database. The validity of the methodology is first tested in multiple conditions using synthetic data and then applied to a large data set of experiments in the chemical kinetic domain as a case study. Findings The analysis results of different test case scenarios suggest that knowledge graph embedding can be used to predict the missing uncertainty of the experiments when there is a hidden relationship between the experiment metadata and the uncertainty values. The link prediction task is also resilient to random noise in the relationship. The knowledge graph embedding outperforms the baseline results if the uncertainty depends upon multiple metadata. Originality/value The employment of knowledge graph embedding to predict the missing experimental uncertainty is a novel alternative to the current and more costly techniques in the literature. Such contribution permits a better data quality profiling of scientific repositories and improves the development process of data-driven models based on scientific experiments.

查看原文本刊更多论文

用于实验不确定度估计的知识图嵌入

目的实验是科学应用数据驱动预测模型开发过程的支柱。实验的质量直接影响模型的性能。不确定性固有地影响实验测量，并且由于其估计成本而经常在可用数据集中丢失。由于类似的原因，与其他数据源相比，实验很少。放弃基于缺失的不确定性值的实验将阻碍预测模型的发展。数据分析技术是评估数据质量的基础，但是在不知道不确定性的情况下，对一些数据质量维度进行评估是具有挑战性的。在此背景下，本文旨在预测实验中缺失的不确定性。设计/方法/方法本工作提出了一种预测实验缺失不确定性的方法，给出了数据集及其本体论描述。该方法基于知识图嵌入，并在实验数据库的知识图表示上利用链接预测任务。首先在多种条件下使用合成数据测试了该方法的有效性，然后将其应用于化学动力学领域的大量实验数据集作为案例研究。发现不同测试用例场景的分析结果表明，当实验元数据与不确定性值之间存在隐藏关系时，知识图嵌入可以用于预测实验缺失的不确定性。链路预测任务还具有抗随机噪声的能力。如果不确定性依赖于多个元数据，则知识图嵌入优于基线结果。原创性/价值利用知识图嵌入来预测缺失的实验不确定性是文献中当前和更昂贵的技术的一种新颖的替代方案。这种贡献允许对科学存储库进行更好的数据质量分析，并改进基于科学实验的数据驱动模型的开发过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Discovery and Delivery INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

5.40

自引率

4.80%

发文量

期刊介绍： Information Discovery and Delivery covers information discovery and access for digital information researchers. This includes educators, knowledge professionals in education and cultural organisations, knowledge managers in media, health care and government, as well as librarians. The journal publishes research and practice which explores the digital information supply chain ie transport, flows, tracking, exchange and sharing, including within and between libraries. It is also interested in digital information capture, packaging and storage by ‘collectors’ of all kinds. Information is widely defined, including but not limited to: Records, Documents, Learning objects, Visual and sound files, Data and metadata and , User-generated content.