Evaluating diabetes dataset for knowledge graph embedding based link prediction

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Sushmita Singh, Manvi Siwach
{"title":"Evaluating diabetes dataset for knowledge graph embedding based link prediction","authors":"Sushmita Singh,&nbsp;Manvi Siwach","doi":"10.1016/j.datak.2025.102414","DOIUrl":null,"url":null,"abstract":"<div><div>For doing any accurate analysis or prediction on data, a complete and well-populated dataset is required. Medical based data for any disease like diabetes is highly coupled and heterogeneous in nature, with numerous interconnections. This inherently complex data cannot be analysed by simple relational databases making knowledge graphs an ideal tool for its representation which can efficiently handle intricate relationships. Thus, knowledge graphs can be leveraged to analyse diabetes data, enhancing both the accuracy and efficiency of data-driven decision-making processes. Although substantial data exists on diabetes in various formats, the availability of organized and complete datasets is limited, highlighting the critical need for creation of a well-populated knowledge graph. Moreover while developing the knowledge graph, an inevitable problem of incompleteness is present due to missing links or relationships, necessitating the use of knowledge graph completion tasks to fill in this absent information which involves predicting missing data with various Link Prediction (LP) techniques. Among various link prediction methods, approaches based on knowledge graph embeddings have demonstrated superior performance and effectiveness. These knowledge graphs can support in-depth analysis and enhance the prediction of diabetes-associated risks in this field. This paper introduces a dataset specifically designed for performing link prediction on a diabetes knowledge graph, so that it can be used to fill the information gaps further contributing in the domain of risk analysis in diabetes. The accuracy of the dataset is assessed through validation with state-of-the-art embedding-based link prediction methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102414"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000096","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

For doing any accurate analysis or prediction on data, a complete and well-populated dataset is required. Medical based data for any disease like diabetes is highly coupled and heterogeneous in nature, with numerous interconnections. This inherently complex data cannot be analysed by simple relational databases making knowledge graphs an ideal tool for its representation which can efficiently handle intricate relationships. Thus, knowledge graphs can be leveraged to analyse diabetes data, enhancing both the accuracy and efficiency of data-driven decision-making processes. Although substantial data exists on diabetes in various formats, the availability of organized and complete datasets is limited, highlighting the critical need for creation of a well-populated knowledge graph. Moreover while developing the knowledge graph, an inevitable problem of incompleteness is present due to missing links or relationships, necessitating the use of knowledge graph completion tasks to fill in this absent information which involves predicting missing data with various Link Prediction (LP) techniques. Among various link prediction methods, approaches based on knowledge graph embeddings have demonstrated superior performance and effectiveness. These knowledge graphs can support in-depth analysis and enhance the prediction of diabetes-associated risks in this field. This paper introduces a dataset specifically designed for performing link prediction on a diabetes knowledge graph, so that it can be used to fill the information gaps further contributing in the domain of risk analysis in diabetes. The accuracy of the dataset is assessed through validation with state-of-the-art embedding-based link prediction methods.
评估基于知识图嵌入的糖尿病数据集链接预测
为了对数据进行任何准确的分析或预测,需要一个完整且人口稠密的数据集。任何疾病(如糖尿病)的医学数据本质上都是高度耦合和异构的,有许多相互联系。简单的关系数据库无法对这种复杂的数据进行分析,这使得知识图成为一种理想的表达工具,可以有效地处理复杂的关系。因此,知识图谱可以用来分析糖尿病数据,提高数据驱动决策过程的准确性和效率。尽管存在各种形式的关于糖尿病的大量数据,但有组织和完整的数据集的可用性是有限的,这突出了创建一个人口稠密的知识图谱的迫切需要。此外,在开发知识图谱时,由于缺少链接或关系而不可避免地存在不完备性问题,需要使用知识图谱补全任务来填补这些缺失信息,这涉及到使用各种链接预测(Link Prediction, LP)技术来预测缺失数据。在各种链接预测方法中,基于知识图嵌入的链接预测方法表现出优异的性能和有效性。这些知识图谱可以支持深入分析并增强该领域糖尿病相关风险的预测。本文介绍了一个专门设计的数据集,用于对糖尿病知识图谱进行链接预测,从而可以用来填补信息空白,进一步为糖尿病风险分析领域做出贡献。通过最先进的基于嵌入的链接预测方法的验证来评估数据集的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信