Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger
{"title":"基于大型语言模型的语义感知查询应答","authors":"Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger","doi":"10.1016/j.datak.2025.102494","DOIUrl":null,"url":null,"abstract":"<div><div>In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102494"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic-aware query answering with Large Language Models\",\"authors\":\"Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger\",\"doi\":\"10.1016/j.datak.2025.102494\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"161 \",\"pages\":\"Article 102494\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000898\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000898","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Semantic-aware query answering with Large Language Models
In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.