Hybrid Querying Over Relational Databases and Large Language Models

Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi
{"title":"Hybrid Querying Over Relational Databases and Large Language Models","authors":"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi","doi":"arxiv-2408.00884","DOIUrl":null,"url":null,"abstract":"Database queries traditionally operate under the closed-world assumption,\nproviding no answers to questions that require information beyond the data\nstored in the database. Hybrid querying using SQL offers an alternative by\nintegrating relational databases with large language models (LLMs) to answer\nbeyond-database questions. In this paper, we present the first cross-domain\nbenchmark, SWAN, containing 120 beyond-database questions over four real-world\ndatabases. To leverage state-of-the-art language models in addressing these\ncomplex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\nquerying, and also discuss potential future directions. Our evaluation\ndemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\\%\nin execution accuracy and 48.2\\% in data factuality. These results highlights\nboth the potential and challenges for hybrid querying. We believe that our work\nwill inspire further research in creating more efficient and accurate data\nsystems that seamlessly integrate relational databases and large language\nmodels to address beyond-database questions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid querying, and also discuss potential future directions. Our evaluation demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\% in execution accuracy and 48.2\% in data factuality. These results highlights both the potential and challenges for hybrid querying. We believe that our work will inspire further research in creating more efficient and accurate data systems that seamlessly integrate relational databases and large language models to address beyond-database questions.
关系数据库和大型语言模型的混合查询
数据库查询传统上是在封闭世界假设下运行的,无法回答需要数据库数据以外信息的问题。通过将关系数据库与大型语言模型(LLM)相结合来回答数据库之外的问题,使用 SQL 的混合查询提供了另一种选择。在本文中,我们介绍了首个跨领域基准 SWAN,其中包含四个真实世界数据库中的 120 个数据库外问题。为了利用最先进的语言模型解决 SWAN 中的这些复杂问题,我们提出了混合查询的初步解决方案 HQDL,并讨论了潜在的未来发展方向。我们的评估结果表明,HQDL 使用 GPT-4 Turbo 和少量提示,执行准确率达到了 40.0%,数据真实性达到了 48.2%。这些结果凸显了混合查询的潜力和挑战。我们相信,我们的工作将激励进一步的研究,以创建更高效、更准确的数据系统,无缝集成关系数据库和大型语言模型,解决数据库之外的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信