SEA-SQL:语义增强型文本到 SQL 自适应细化

Chaofan Li, Yingxia Shao, Zheng Liu
{"title":"SEA-SQL:语义增强型文本到 SQL 自适应细化","authors":"Chaofan Li, Yingxia Shao, Zheng Liu","doi":"arxiv-2408.04919","DOIUrl":null,"url":null,"abstract":"Recent advancements in large language models (LLMs) have significantly\ncontributed to the progress of the Text-to-SQL task. A common requirement in\nmany of these works is the post-correction of SQL queries. However, the\nmajority of this process entails analyzing error cases to develop prompts with\nrules that eliminate model bias. And there is an absence of execution\nverification for SQL queries. In addition, the prevalent techniques primarily\ndepend on GPT-4 and few-shot prompts, resulting in expensive costs. To\ninvestigate the effective methods for SQL refinement in a cost-efficient\nmanner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement\n(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution\nAdjustment, aims to improve performance while minimizing resource expenditure\nwith zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced\nschema to augment database information and optimize SQL queries. During the SQL\nquery generation, a fine-tuned adaptive bias eliminator is applied to mitigate\ninherent biases caused by the LLM. The dynamic execution adjustment is utilized\nto guarantee the executability of the bias eliminated SQL query. We conduct\nexperiments on the Spider and BIRD datasets to demonstrate the effectiveness of\nthis framework. The results demonstrate that SEA-SQL achieves state-of-the-art\nperformance in the GPT3.5 scenario with 9%-58% of the generation cost.\nFurthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the\ngeneration cost.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement\",\"authors\":\"Chaofan Li, Yingxia Shao, Zheng Liu\",\"doi\":\"arxiv-2408.04919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in large language models (LLMs) have significantly\\ncontributed to the progress of the Text-to-SQL task. A common requirement in\\nmany of these works is the post-correction of SQL queries. However, the\\nmajority of this process entails analyzing error cases to develop prompts with\\nrules that eliminate model bias. And there is an absence of execution\\nverification for SQL queries. In addition, the prevalent techniques primarily\\ndepend on GPT-4 and few-shot prompts, resulting in expensive costs. To\\ninvestigate the effective methods for SQL refinement in a cost-efficient\\nmanner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement\\n(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution\\nAdjustment, aims to improve performance while minimizing resource expenditure\\nwith zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced\\nschema to augment database information and optimize SQL queries. During the SQL\\nquery generation, a fine-tuned adaptive bias eliminator is applied to mitigate\\ninherent biases caused by the LLM. The dynamic execution adjustment is utilized\\nto guarantee the executability of the bias eliminated SQL query. We conduct\\nexperiments on the Spider and BIRD datasets to demonstrate the effectiveness of\\nthis framework. The results demonstrate that SEA-SQL achieves state-of-the-art\\nperformance in the GPT3.5 scenario with 9%-58% of the generation cost.\\nFurthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the\\ngeneration cost.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLM)的最新进展极大地推动了文本到 SQL 任务的进展。这些工作中的一个共同要求是对 SQL 查询进行事后纠正。然而,这一过程的大部分工作都需要分析错误案例,以制定具有消除模型偏差的规则的提示。此外,还缺乏对 SQL 查询的执行验证。此外,目前流行的技术主要依赖于 GPT-4 和少量提示,导致成本高昂。为了探索低成本高效率的 SQL 精炼方法,我们引入了语义增强型文本到 SQL 自适应精炼(SEA-SQL),其中包括自适应偏差消除和动态执行调整,旨在提高性能的同时最大限度地减少资源支出,并实现零次提示。具体来说,SEA-SQL 采用语义增强模式来增强数据库信息并优化 SQL 查询。在 SQL 查询生成过程中,应用微调自适应偏差消除器来减轻由 LLM 引起的固有偏差。利用动态执行调整来保证消除了偏差的 SQL 查询的可执行性。我们在 Spider 和 BIRD 数据集上进行了实验,以证明该框架的有效性。结果表明,SEA-SQL在GPT3.5场景下实现了最先进的性能,生成成本降低了9%-58%,而且SEA-SQL与GPT-4相当,生成成本仅降低了0.9%-5.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement
Recent advancements in large language models (LLMs) have significantly contributed to the progress of the Text-to-SQL task. A common requirement in many of these works is the post-correction of SQL queries. However, the majority of this process entails analyzing error cases to develop prompts with rules that eliminate model bias. And there is an absence of execution verification for SQL queries. In addition, the prevalent techniques primarily depend on GPT-4 and few-shot prompts, resulting in expensive costs. To investigate the effective methods for SQL refinement in a cost-efficient manner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement (SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution Adjustment, aims to improve performance while minimizing resource expenditure with zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced schema to augment database information and optimize SQL queries. During the SQL query generation, a fine-tuned adaptive bias eliminator is applied to mitigate inherent biases caused by the LLM. The dynamic execution adjustment is utilized to guarantee the executability of the bias eliminated SQL query. We conduct experiments on the Spider and BIRD datasets to demonstrate the effectiveness of this framework. The results demonstrate that SEA-SQL achieves state-of-the-art performance in the GPT3.5 scenario with 9%-58% of the generation cost. Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the generation cost.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信