Meixiu Long , Jiahai Wang , Junxiao Ma , Jianpeng Zhou , Siyuan Chen
{"title":"LLM-augmented entity alignment: an unsupervised and training-free framework","authors":"Meixiu Long , Jiahai Wang , Junxiao Ma , Jianpeng Zhou , Siyuan Chen","doi":"10.1016/j.neunet.2025.108139","DOIUrl":null,"url":null,"abstract":"<div><div>Entity alignment (EA) is a fundamental task in knowledge graph (KG) integration, aiming to identify equivalent entities across different KGs for a unified and comprehensive representation. Recent advances have explored pre-trained language models (PLMs) to enhance the semantic understanding of entities, achieving notable improvements. However, existing methods face two major limitations. First, they rely heavily on human-annotated labels for training, leading to high computational costs and poor scalability. Second, some approaches use large language models (LLMs) to predict alignments in a multi-choice question format, but LLM outputs may deviate from expected formats, and predefined options may exclude correct matches, leading to suboptimal performance. To address these issues, we propose LEA, an LLM-augmented entity alignment framework that eliminates the need for labeled data and enhances robustness by mitigating information heterogeneity at both embedding and semantic levels. LEA first introduces an entity textualization module that transforms structural and textual information into a unified format, ensuring consistency and improving entity representations. It then leverages LLMs to enrich entity descriptions, enhancing semantic distinctiveness. Finally, these enriched descriptions are encoded into a shared embedding space, enabling efficient alignment through text retrieval techniques. To balance performance and computational cost, we further propose a selective augmentation strategy that prioritizes the most ambiguous entities for refinement. Experimental results on both homogeneous and heterogeneous KGs demonstrate that LEA outperforms existing models trained on 30 % labeled data, achieving a 30 % absolute improvement in Hit@1 score. As LLMs and text embedding models advance, LEA is expected to further enhance EA performance, providing a scalable and robust paradigm for practical applications. The code and dataset can be found at <span><span>https://github.com/Longmeix/LEA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108139"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025010196","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Entity alignment (EA) is a fundamental task in knowledge graph (KG) integration, aiming to identify equivalent entities across different KGs for a unified and comprehensive representation. Recent advances have explored pre-trained language models (PLMs) to enhance the semantic understanding of entities, achieving notable improvements. However, existing methods face two major limitations. First, they rely heavily on human-annotated labels for training, leading to high computational costs and poor scalability. Second, some approaches use large language models (LLMs) to predict alignments in a multi-choice question format, but LLM outputs may deviate from expected formats, and predefined options may exclude correct matches, leading to suboptimal performance. To address these issues, we propose LEA, an LLM-augmented entity alignment framework that eliminates the need for labeled data and enhances robustness by mitigating information heterogeneity at both embedding and semantic levels. LEA first introduces an entity textualization module that transforms structural and textual information into a unified format, ensuring consistency and improving entity representations. It then leverages LLMs to enrich entity descriptions, enhancing semantic distinctiveness. Finally, these enriched descriptions are encoded into a shared embedding space, enabling efficient alignment through text retrieval techniques. To balance performance and computational cost, we further propose a selective augmentation strategy that prioritizes the most ambiguous entities for refinement. Experimental results on both homogeneous and heterogeneous KGs demonstrate that LEA outperforms existing models trained on 30 % labeled data, achieving a 30 % absolute improvement in Hit@1 score. As LLMs and text embedding models advance, LEA is expected to further enhance EA performance, providing a scalable and robust paradigm for practical applications. The code and dataset can be found at https://github.com/Longmeix/LEA.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.