{"title":"Hybrid Querying Over Relational Databases and Large Language Models","authors":"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi","doi":"arxiv-2408.00884","DOIUrl":null,"url":null,"abstract":"Database queries traditionally operate under the closed-world assumption,\nproviding no answers to questions that require information beyond the data\nstored in the database. Hybrid querying using SQL offers an alternative by\nintegrating relational databases with large language models (LLMs) to answer\nbeyond-database questions. In this paper, we present the first cross-domain\nbenchmark, SWAN, containing 120 beyond-database questions over four real-world\ndatabases. To leverage state-of-the-art language models in addressing these\ncomplex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\nquerying, and also discuss potential future directions. Our evaluation\ndemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\\%\nin execution accuracy and 48.2\\% in data factuality. These results highlights\nboth the potential and challenges for hybrid querying. We believe that our work\nwill inspire further research in creating more efficient and accurate data\nsystems that seamlessly integrate relational databases and large language\nmodels to address beyond-database questions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Database queries traditionally operate under the closed-world assumption,
providing no answers to questions that require information beyond the data
stored in the database. Hybrid querying using SQL offers an alternative by
integrating relational databases with large language models (LLMs) to answer
beyond-database questions. In this paper, we present the first cross-domain
benchmark, SWAN, containing 120 beyond-database questions over four real-world
databases. To leverage state-of-the-art language models in addressing these
complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid
querying, and also discuss potential future directions. Our evaluation
demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0\%
in execution accuracy and 48.2\% in data factuality. These results highlights
both the potential and challenges for hybrid querying. We believe that our work
will inspire further research in creating more efficient and accurate data
systems that seamlessly integrate relational databases and large language
models to address beyond-database questions.