法学硕士增强实体对齐:无监督和无培训框架。

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Meixiu Long , Jiahai Wang , Junxiao Ma , Jianpeng Zhou , Siyuan Chen
{"title":"法学硕士增强实体对齐:无监督和无培训框架。","authors":"Meixiu Long ,&nbsp;Jiahai Wang ,&nbsp;Junxiao Ma ,&nbsp;Jianpeng Zhou ,&nbsp;Siyuan Chen","doi":"10.1016/j.neunet.2025.108139","DOIUrl":null,"url":null,"abstract":"<div><div>Entity alignment (EA) is a fundamental task in knowledge graph (KG) integration, aiming to identify equivalent entities across different KGs for a unified and comprehensive representation. Recent advances have explored pre-trained language models (PLMs) to enhance the semantic understanding of entities, achieving notable improvements. However, existing methods face two major limitations. First, they rely heavily on human-annotated labels for training, leading to high computational costs and poor scalability. Second, some approaches use large language models (LLMs) to predict alignments in a multi-choice question format, but LLM outputs may deviate from expected formats, and predefined options may exclude correct matches, leading to suboptimal performance. To address these issues, we propose LEA, an LLM-augmented entity alignment framework that eliminates the need for labeled data and enhances robustness by mitigating information heterogeneity at both embedding and semantic levels. LEA first introduces an entity textualization module that transforms structural and textual information into a unified format, ensuring consistency and improving entity representations. It then leverages LLMs to enrich entity descriptions, enhancing semantic distinctiveness. Finally, these enriched descriptions are encoded into a shared embedding space, enabling efficient alignment through text retrieval techniques. To balance performance and computational cost, we further propose a selective augmentation strategy that prioritizes the most ambiguous entities for refinement. Experimental results on both homogeneous and heterogeneous KGs demonstrate that LEA outperforms existing models trained on 30 % labeled data, achieving a 30 % absolute improvement in Hit@1 score. As LLMs and text embedding models advance, LEA is expected to further enhance EA performance, providing a scalable and robust paradigm for practical applications. The code and dataset can be found at <span><span>https://github.com/Longmeix/LEA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108139"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-augmented entity alignment: an unsupervised and training-free framework\",\"authors\":\"Meixiu Long ,&nbsp;Jiahai Wang ,&nbsp;Junxiao Ma ,&nbsp;Jianpeng Zhou ,&nbsp;Siyuan Chen\",\"doi\":\"10.1016/j.neunet.2025.108139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Entity alignment (EA) is a fundamental task in knowledge graph (KG) integration, aiming to identify equivalent entities across different KGs for a unified and comprehensive representation. Recent advances have explored pre-trained language models (PLMs) to enhance the semantic understanding of entities, achieving notable improvements. However, existing methods face two major limitations. First, they rely heavily on human-annotated labels for training, leading to high computational costs and poor scalability. Second, some approaches use large language models (LLMs) to predict alignments in a multi-choice question format, but LLM outputs may deviate from expected formats, and predefined options may exclude correct matches, leading to suboptimal performance. To address these issues, we propose LEA, an LLM-augmented entity alignment framework that eliminates the need for labeled data and enhances robustness by mitigating information heterogeneity at both embedding and semantic levels. LEA first introduces an entity textualization module that transforms structural and textual information into a unified format, ensuring consistency and improving entity representations. It then leverages LLMs to enrich entity descriptions, enhancing semantic distinctiveness. Finally, these enriched descriptions are encoded into a shared embedding space, enabling efficient alignment through text retrieval techniques. To balance performance and computational cost, we further propose a selective augmentation strategy that prioritizes the most ambiguous entities for refinement. Experimental results on both homogeneous and heterogeneous KGs demonstrate that LEA outperforms existing models trained on 30 % labeled data, achieving a 30 % absolute improvement in Hit@1 score. As LLMs and text embedding models advance, LEA is expected to further enhance EA performance, providing a scalable and robust paradigm for practical applications. The code and dataset can be found at <span><span>https://github.com/Longmeix/LEA</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"194 \",\"pages\":\"Article 108139\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025010196\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025010196","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

实体对齐(Entity alignment, EA)是知识图谱集成中的一项基本任务,旨在识别不同知识图谱中的等价实体,以实现统一、全面的表示。最近的进展是探索预训练语言模型(PLMs)来增强实体的语义理解,取得了显著的进步。然而,现有的方法面临两个主要的限制。首先,它们严重依赖人工标注的标签进行训练,导致计算成本高,可扩展性差。其次,一些方法使用大型语言模型(LLM)来预测多选择题格式中的对齐情况,但是LLM输出可能偏离预期格式,并且预定义的选项可能会排除正确的匹配,从而导致性能次优。为了解决这些问题,我们提出了LEA,这是一个llm增强的实体对齐框架,它消除了对标记数据的需求,并通过在嵌入和语义级别减轻信息异质性来增强鲁棒性。LEA首先引入了一个实体文本化模块,该模块将结构信息和文本信息转换为统一的格式,从而确保一致性并改进实体表示。然后,它利用llm来丰富实体描述,增强语义独特性。最后,这些丰富的描述被编码到一个共享的嵌入空间中,通过文本检索技术实现有效的对齐。为了平衡性能和计算成本,我们进一步提出了一种选择性增强策略,该策略优先考虑最模糊的实体进行细化。在同质和异质kg上的实验结果表明,LEA优于在30%标记数据上训练的现有模型,在Hit@1得分上实现了30%的绝对提高。随着法学硕士和文本嵌入模型的发展,LEA有望进一步提高EA的性能,为实际应用提供可扩展和健壮的范例。代码和数据集可以在https://github.com/Longmeix/LEA上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LLM-augmented entity alignment: an unsupervised and training-free framework
Entity alignment (EA) is a fundamental task in knowledge graph (KG) integration, aiming to identify equivalent entities across different KGs for a unified and comprehensive representation. Recent advances have explored pre-trained language models (PLMs) to enhance the semantic understanding of entities, achieving notable improvements. However, existing methods face two major limitations. First, they rely heavily on human-annotated labels for training, leading to high computational costs and poor scalability. Second, some approaches use large language models (LLMs) to predict alignments in a multi-choice question format, but LLM outputs may deviate from expected formats, and predefined options may exclude correct matches, leading to suboptimal performance. To address these issues, we propose LEA, an LLM-augmented entity alignment framework that eliminates the need for labeled data and enhances robustness by mitigating information heterogeneity at both embedding and semantic levels. LEA first introduces an entity textualization module that transforms structural and textual information into a unified format, ensuring consistency and improving entity representations. It then leverages LLMs to enrich entity descriptions, enhancing semantic distinctiveness. Finally, these enriched descriptions are encoded into a shared embedding space, enabling efficient alignment through text retrieval techniques. To balance performance and computational cost, we further propose a selective augmentation strategy that prioritizes the most ambiguous entities for refinement. Experimental results on both homogeneous and heterogeneous KGs demonstrate that LEA outperforms existing models trained on 30 % labeled data, achieving a 30 % absolute improvement in Hit@1 score. As LLMs and text embedding models advance, LEA is expected to further enhance EA performance, providing a scalable and robust paradigm for practical applications. The code and dataset can be found at https://github.com/Longmeix/LEA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信