BoostER：利用大型语言模型增强实体解析能力

ArXiv Pub Date : 2024-03-11 DOI:10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245

Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen

{"title":"BoostER：利用大型语言模型增强实体解析能力","authors":"Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen","doi":"10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245","DOIUrl":null,"url":null,"abstract":"Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"28 37","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BoostER: Leveraging Large Language Models for Enhancing Entity Resolution\",\"authors\":\"Huahang Li, Shuangyin Li, Fei Hao, C. Zhang, Yuanfeng Song, Lei Chen\",\"doi\":\"10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.\",\"PeriodicalId\":513202,\"journal\":{\"name\":\"ArXiv\",\"volume\":\"28 37\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589335.3651245 10.1145/3589335.3651245 10.1145/3589335.3651245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

实体解析涉及识别和合并指向同一现实世界实体的记录，是网络数据集成等领域的一项重要任务。网络上存在大量重复和多版本的数据资源，这就凸显了这项任务的重要性。然而，实现高质量的实体解析通常需要付出巨大的努力。像 GPT-4 这样的大型语言模型（LLM）的出现展示了先进的语言能力，可以成为这项任务的新范例。在本文中，我们提出了一个名为 BoostER 的演示系统，该系统研究了在实体解析过程中利用 LLM 的可能性，揭示了 LLM 在易于部署和低成本方面的优势。我们的方法以最佳方式选择一组匹配问题，并将其提交给 LLMs 进行验证，然后根据 LLMs 的响应完善实体解析结果的分布。这为现实世界的应用，尤其是个人或小公司的应用，提供了实现高质量实体解析结果的广阔前景，而无需大量的模型训练或大量的资金投入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BoostER: Leveraging Large Language Models for Enhancing Entity Resolution

Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data resources on the Web. However, achieving high-quality entity resolution typically demands significant effort. The advent of Large Language Models (LLMs) like GPT-4 has demonstrated advanced linguistic capabilities, which can be a new paradigm for this task. In this paper, we propose a demonstration system named BoostER that examines the possibility of leveraging LLMs in the entity resolution process, revealing advantages in both easy deployment and low cost. Our approach optimally selects a set of matching questions and poses them to LLMs for verification, then refines the distribution of entity resolution results with the response of LLMs. This offers promising prospects to achieve a high-quality entity resolution result for real-world applications, especially to individuals or small companies without the need for extensive model training or significant financial investment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量