PathEL: A novel collective entity linking method based on relationship paths in heterogeneous information networks

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2024-08-13 DOI:10.1016/j.is.2024.102433

Lizheng Zu, Lin Lin, Song Fu, Jie Liu, Shiwei Suo, Wenhui He, Jinlei Wu, Yancheng Lv

{"title":"PathEL: A novel collective entity linking method based on relationship paths in heterogeneous information networks","authors":"Lizheng Zu, Lin Lin, Song Fu, Jie Liu, Shiwei Suo, Wenhui He, Jinlei Wu, Yancheng Lv","doi":"10.1016/j.is.2024.102433","DOIUrl":null,"url":null,"abstract":"<div><p>Collective entity linking always outperforms independent entity linking because it considers the interdependencies among entities. However, the existing collective entity linking methods often have high time complexity, do not fully utilize the relationship information in heterogeneous information networks (HIN) and most of them are largely dependent on the special features associated with Wikipedia. Based on the above problems, this paper proposes a novel collective entity linking method based on relationship path in heterogeneous information networks (PathEL). The PathEL classifies complex relationships in HIN into 1-hop paths and 3 types of 2-hop paths, and measures entity correlation by the path information among entities, ultimately combining textual semantic information to realize collective entity linking. In addition, facing the high complexity of collective entity linking, this paper proposes to solve the problem by combining the variable sliding window data processing method and the two-step pruning strategy. The variable sliding window data processing method limits the number of entity mentions in each window and the pruning strategy reduces the number of candidate entities. Finally, the experimental results of three benchmark datasets verify that the model proposed in this paper performs better in entity linking than the baseline models. On the AIDA CoNLL dataset, compared to the second-ranked model, our model has improved P, R, and F1 scores by 1.61%, 1.54%, and 1.57%, respectively.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102433"},"PeriodicalIF":3.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437924000917","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Collective entity linking always outperforms independent entity linking because it considers the interdependencies among entities. However, the existing collective entity linking methods often have high time complexity, do not fully utilize the relationship information in heterogeneous information networks (HIN) and most of them are largely dependent on the special features associated with Wikipedia. Based on the above problems, this paper proposes a novel collective entity linking method based on relationship path in heterogeneous information networks (PathEL). The PathEL classifies complex relationships in HIN into 1-hop paths and 3 types of 2-hop paths, and measures entity correlation by the path information among entities, ultimately combining textual semantic information to realize collective entity linking. In addition, facing the high complexity of collective entity linking, this paper proposes to solve the problem by combining the variable sliding window data processing method and the two-step pruning strategy. The variable sliding window data processing method limits the number of entity mentions in each window and the pruning strategy reduces the number of candidate entities. Finally, the experimental results of three benchmark datasets verify that the model proposed in this paper performs better in entity linking than the baseline models. On the AIDA CoNLL dataset, compared to the second-ranked model, our model has improved P, R, and F1 scores by 1.61%, 1.54%, and 1.57%, respectively.

查看原文本刊更多论文

PathEL：基于异构信息网络关系路径的新型集体实体链接方法

集体实体链接总是优于独立实体链接，因为集体实体链接考虑了实体之间的相互依赖关系。然而，现有的集体实体链接方法往往时间复杂度高，不能充分利用异构信息网络（HIN）中的关系信息，而且大多数方法在很大程度上依赖于维基百科的相关特殊功能。基于上述问题，本文提出了一种基于异构信息网络关系路径的新型集体实体链接方法（PathEL）。PathEL 将异构信息网络中的复杂关系分为 1 跳路径和 3 种 2 跳路径，并通过实体间的路径信息度量实体相关性，最终结合文本语义信息实现集体实体链接。此外，面对集体实体链接的高复杂性，本文提出了结合可变滑动窗口数据处理方法和两步剪枝策略来解决这一问题。可变滑动窗口数据处理方法限制了每个窗口中实体提及的数量，而剪枝策略则减少了候选实体的数量。最后，三个基准数据集的实验结果验证了本文提出的模型在实体链接方面的表现优于基准模型。在 AIDA CoNLL 数据集上，与排名第二的模型相比，我们的模型的 P、R 和 F1 分数分别提高了 1.61%、1.54% 和 1.57%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.