Hybrid Querying Over Relational Databases and Large Language Models

arXiv - CS - Databases Pub Date : 2024-08-01 DOI:arxiv-2408.00884

Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi

引用次数: 0

Abstract

Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid querying, and also discuss potential future directions. Our evaluation demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\% in execution accuracy and 48.2\% in data factuality. These results highlights both the potential and challenges for hybrid querying. We believe that our work will inspire further research in creating more efficient and accurate data systems that seamlessly integrate relational databases and large language models to address beyond-database questions.

查看原文本刊更多论文

关系数据库和大型语言模型的混合查询

数据库查询传统上是在封闭世界假设下运行的，无法回答需要数据库数据以外信息的问题。通过将关系数据库与大型语言模型（LLM）相结合来回答数据库之外的问题，使用 SQL 的混合查询提供了另一种选择。在本文中，我们介绍了首个跨领域基准 SWAN，其中包含四个真实世界数据库中的 120 个数据库外问题。为了利用最先进的语言模型解决 SWAN 中的这些复杂问题，我们提出了混合查询的初步解决方案 HQDL，并讨论了潜在的未来发展方向。我们的评估结果表明，HQDL 使用 GPT-4 Turbo 和少量提示，执行准确率达到了 40.0%，数据真实性达到了 48.2%。这些结果凸显了混合查询的潜力和挑战。我们相信，我们的工作将激励进一步的研究，以创建更高效、更准确的数据系统，无缝集成关系数据库和大型语言模型，解决数据库之外的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量