利用复杂的 SQL 工作负载评估文本到 SQL 生成的 LLM

Limin Ma, Ken Pu, Ying Zhu
{"title":"利用复杂的 SQL 工作负载评估文本到 SQL 生成的 LLM","authors":"Limin Ma, Ken Pu, Ying Zhu","doi":"arxiv-2407.19517","DOIUrl":null,"url":null,"abstract":"This study presents a comparative analysis of the a complex SQL benchmark,\nTPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings\nreveal that TPC-DS queries exhibit a significantly higher level of structural\ncomplexity compared to the other two benchmarks. This underscores the need for\nmore intricate benchmarks to simulate realistic scenarios effectively. To\nfacilitate this comparison, we devised several measures of structural\ncomplexity and applied them across all three benchmarks. The results of this\nstudy can guide future research in the development of more sophisticated\ntext-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based\non the query descriptions provided by the TPC-DS benchmark. The prompt\nengineering process incorporated both the query description as outlined in the\nTPC-DS specification and the database schema of TPC-DS. Our findings indicate\nthat the current state-of-the-art generative AI models fall short in generating\naccurate decision-making queries. We conducted a comparison of the generated\nqueries with the TPC-DS gold standard queries using a series of fuzzy structure\nmatching techniques based on query features. The results demonstrated that the\naccuracy of the generated queries is insufficient for practical real-world\napplication.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload\",\"authors\":\"Limin Ma, Ken Pu, Ying Zhu\",\"doi\":\"arxiv-2407.19517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study presents a comparative analysis of the a complex SQL benchmark,\\nTPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings\\nreveal that TPC-DS queries exhibit a significantly higher level of structural\\ncomplexity compared to the other two benchmarks. This underscores the need for\\nmore intricate benchmarks to simulate realistic scenarios effectively. To\\nfacilitate this comparison, we devised several measures of structural\\ncomplexity and applied them across all three benchmarks. The results of this\\nstudy can guide future research in the development of more sophisticated\\ntext-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based\\non the query descriptions provided by the TPC-DS benchmark. The prompt\\nengineering process incorporated both the query description as outlined in the\\nTPC-DS specification and the database schema of TPC-DS. Our findings indicate\\nthat the current state-of-the-art generative AI models fall short in generating\\naccurate decision-making queries. We conducted a comparison of the generated\\nqueries with the TPC-DS gold standard queries using a series of fuzzy structure\\nmatching techniques based on query features. The results demonstrated that the\\naccuracy of the generated queries is insufficient for practical real-world\\napplication.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.19517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究对复杂 SQL 基准 TPC-DS 与现有的两个文本到 SQL 基准 BIRD 和 Spider 进行了比较分析。我们的研究结果表明,与其他两个基准相比,TPC-DS 查询表现出更高的结构复杂性。这说明需要更复杂的基准来有效模拟现实场景。为了便于比较,我们设计了几种结构复杂性测量方法,并将它们应用于所有三种基准。这项研究的结果可以指导未来开发更复杂的文本到 SQL 基准的研究。我们利用 11 种不同的语言模型 (LLM) 根据 TPC-DS 基准提供的查询描述生成 SQL 查询。提示工程过程既包括 TPC-DS 规范中概述的查询描述,也包括 TPC-DS 的数据库模式。我们的研究结果表明,当前最先进的生成式人工智能模型在生成准确的决策查询方面存在不足。我们使用一系列基于查询特征的模糊结构匹配技术,将生成的查询与TPC-DS黄金标准查询进行了比较。结果表明,生成查询的准确性不足以满足实际应用的需要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload
This study presents a comparative analysis of the a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of structural complexity compared to the other two benchmarks. This underscores the need for more intricate benchmarks to simulate realistic scenarios effectively. To facilitate this comparison, we devised several measures of structural complexity and applied them across all three benchmarks. The results of this study can guide future research in the development of more sophisticated text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based on the query descriptions provided by the TPC-DS benchmark. The prompt engineering process incorporated both the query description as outlined in the TPC-DS specification and the database schema of TPC-DS. Our findings indicate that the current state-of-the-art generative AI models fall short in generating accurate decision-making queries. We conducted a comparison of the generated queries with the TPC-DS gold standard queries using a series of fuzzy structure matching techniques based on query features. The results demonstrated that the accuracy of the generated queries is insufficient for practical real-world application.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信