A high-accuracy unsupervised statistical learning method for joint dangling entity detection and entity alignment

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-04-11 DOI:10.1016/j.is.2025.102554

Cong Xu , Mengxin Shi , Xiang Gao , Zhongkang Yin , Xiujuan Yao , Wei Li , Jiasen Yang

{"title":"A high-accuracy unsupervised statistical learning method for joint dangling entity detection and entity alignment","authors":"Cong Xu , Mengxin Shi , Xiang Gao , Zhongkang Yin , Xiujuan Yao , Wei Li , Jiasen Yang","doi":"10.1016/j.is.2025.102554","DOIUrl":null,"url":null,"abstract":"<div><div>Dangling entities are common in knowledge graphs but there is a lack of research on entity alignment involving them. Most existing studies leverage neural network methods through supervised learning. However, these data-driven methods suffer from poor interpretability and high computation overhead. In this paper, we propose a Simple Unsupervised Dangling entity detection and entity Alignment method (SUDA)<span><span><sup>1</sup></span></span> without employing neural networks. Our method consists of three modules: entity embedding, dangling entity detection, and entity alignment. While the state-of-the-art Simple but Effective Unsupervised entity alignment method (SEU)<span><span><sup>2</sup></span></span> is incapable of dealing with dangling entities, SUDA further extends it and addresses the bilateral dangling entities problem. Theoretical proof of our method is given. We also design a new adjacent matrix for incorporating richer entity relations. Then we construct entity similarity outlier intervals to detect dangling entities and align entities through assignment problem after removing them. Extensive experiments demonstrate that our method outperforms those supervised and unsupervised methods. Additionally, in the entity alignment tasks, SUDA consumes less runtime compared to neural network methods, while maintaining high efficiency, interpretability, and stability. Code is available at <span><span>https://github.com/skyccong/SUDA.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102554"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000390","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Dangling entities are common in knowledge graphs but there is a lack of research on entity alignment involving them. Most existing studies leverage neural network methods through supervised learning. However, these data-driven methods suffer from poor interpretability and high computation overhead. In this paper, we propose a Simple Unsupervised Dangling entity detection and entity Alignment method (SUDA)¹ without employing neural networks. Our method consists of three modules: entity embedding, dangling entity detection, and entity alignment. While the state-of-the-art Simple but Effective Unsupervised entity alignment method (SEU)² is incapable of dealing with dangling entities, SUDA further extends it and addresses the bilateral dangling entities problem. Theoretical proof of our method is given. We also design a new adjacent matrix for incorporating richer entity relations. Then we construct entity similarity outlier intervals to detect dangling entities and align entities through assignment problem after removing them. Extensive experiments demonstrate that our method outperforms those supervised and unsupervised methods. Additionally, in the entity alignment tasks, SUDA consumes less runtime compared to neural network methods, while maintaining high efficiency, interpretability, and stability. Code is available at https://github.com/skyccong/SUDA.git.

Abstract Image

查看原文本刊更多论文

一种用于关节悬垂实体检测和实体对齐的高精度无监督统计学习方法

悬空实体在知识图谱中很常见，但关于悬空实体对齐的研究较少。大多数现有研究通过监督学习利用神经网络方法。然而，这些数据驱动的方法存在可解释性差和计算开销高的问题。在本文中，我们提出了一种不使用神经网络的简单无监督悬垂实体检测和实体对齐方法（SUDA）1。我们的方法包括三个模块：实体嵌入、悬空实体检测和实体对齐。虽然最先进的简单而有效的无监督实体对齐方法（SEU）2无法处理悬空实体，但SUDA进一步扩展了它并解决了双边悬空实体问题。给出了该方法的理论证明。我们还设计了一个新的相邻矩阵，以纳入更丰富的实体关系。然后构造实体相似度离群区间来检测悬空实体，并通过去除悬空实体后的赋值问题对悬空实体进行对齐。大量的实验表明，我们的方法优于那些有监督和无监督的方法。此外，在实体对齐任务中，与神经网络方法相比，SUDA消耗的运行时间更少，同时保持了高效率、可解释性和稳定性。代码可从https://github.com/skyccong/SUDA.git获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.