基于大型语言模型的语义感知查询应答

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2025-08-05 DOI:10.1016/j.datak.2025.102494

Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger

{"title":"基于大型语言模型的语义感知查询应答","authors":"Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger","doi":"10.1016/j.datak.2025.102494","DOIUrl":null,"url":null,"abstract":"<div><div>In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102494"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic-aware query answering with Large Language Models\",\"authors\":\"Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger\",\"doi\":\"10.1016/j.datak.2025.102494\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"161 \",\"pages\":\"Article 102494\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000898\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000898","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在现代数据驱动的世界中，回答对异构和语义不一致数据的查询仍然是一个重大挑战。现代数据集来自不同的来源，例如关系数据库、半结构化存储库和非结构化文档，这导致模式、术语和数据格式存在很大的可变性。传统系统受到严格的语法匹配和严格的数据绑定的约束，难以捕获关键的语义连接和模式歧义，无法满足数据科学家对查询回答中高级形式的灵活性和上下文感知的日益增长的需求。与此同时，大型语言模型（llm）的出现为自然语言解释引入了新的功能，使它们在解决此类挑战方面非常有希望。然而，法学硕士本身缺乏在高风险领域进行稳健查询处理和决策所需的系统严谨性和可解释性。在本文中，我们提出了软查询应答（Soft QA），这是一种新颖的混合方法，它将llm作为查询处理管道中的中间语义层集成在一起。软QA通过上下文感知、模式通知提示注入语义理解来增强查询回答的适应性和灵活性，并利用llm在复杂设置中语义链接实体、解决歧义并提供准确的查询结果。我们通过现实世界的例子展示了它的实际有效性，强调了它解决语义不匹配和改进查询结果的能力，而不需要大量的数据清理或重组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic-aware query answering with Large Language Models

In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.