{"title":"SEA-SQL:语义增强型文本到 SQL 自适应细化","authors":"Chaofan Li, Yingxia Shao, Zheng Liu","doi":"arxiv-2408.04919","DOIUrl":null,"url":null,"abstract":"Recent advancements in large language models (LLMs) have significantly\ncontributed to the progress of the Text-to-SQL task. A common requirement in\nmany of these works is the post-correction of SQL queries. However, the\nmajority of this process entails analyzing error cases to develop prompts with\nrules that eliminate model bias. And there is an absence of execution\nverification for SQL queries. In addition, the prevalent techniques primarily\ndepend on GPT-4 and few-shot prompts, resulting in expensive costs. To\ninvestigate the effective methods for SQL refinement in a cost-efficient\nmanner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement\n(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution\nAdjustment, aims to improve performance while minimizing resource expenditure\nwith zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced\nschema to augment database information and optimize SQL queries. During the SQL\nquery generation, a fine-tuned adaptive bias eliminator is applied to mitigate\ninherent biases caused by the LLM. The dynamic execution adjustment is utilized\nto guarantee the executability of the bias eliminated SQL query. We conduct\nexperiments on the Spider and BIRD datasets to demonstrate the effectiveness of\nthis framework. The results demonstrate that SEA-SQL achieves state-of-the-art\nperformance in the GPT3.5 scenario with 9%-58% of the generation cost.\nFurthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the\ngeneration cost.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement\",\"authors\":\"Chaofan Li, Yingxia Shao, Zheng Liu\",\"doi\":\"arxiv-2408.04919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in large language models (LLMs) have significantly\\ncontributed to the progress of the Text-to-SQL task. A common requirement in\\nmany of these works is the post-correction of SQL queries. However, the\\nmajority of this process entails analyzing error cases to develop prompts with\\nrules that eliminate model bias. And there is an absence of execution\\nverification for SQL queries. In addition, the prevalent techniques primarily\\ndepend on GPT-4 and few-shot prompts, resulting in expensive costs. To\\ninvestigate the effective methods for SQL refinement in a cost-efficient\\nmanner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement\\n(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution\\nAdjustment, aims to improve performance while minimizing resource expenditure\\nwith zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced\\nschema to augment database information and optimize SQL queries. During the SQL\\nquery generation, a fine-tuned adaptive bias eliminator is applied to mitigate\\ninherent biases caused by the LLM. The dynamic execution adjustment is utilized\\nto guarantee the executability of the bias eliminated SQL query. We conduct\\nexperiments on the Spider and BIRD datasets to demonstrate the effectiveness of\\nthis framework. The results demonstrate that SEA-SQL achieves state-of-the-art\\nperformance in the GPT3.5 scenario with 9%-58% of the generation cost.\\nFurthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the\\ngeneration cost.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement
Recent advancements in large language models (LLMs) have significantly
contributed to the progress of the Text-to-SQL task. A common requirement in
many of these works is the post-correction of SQL queries. However, the
majority of this process entails analyzing error cases to develop prompts with
rules that eliminate model bias. And there is an absence of execution
verification for SQL queries. In addition, the prevalent techniques primarily
depend on GPT-4 and few-shot prompts, resulting in expensive costs. To
investigate the effective methods for SQL refinement in a cost-efficient
manner, we introduce Semantic-Enhanced Text-to-SQL with Adaptive Refinement
(SEA-SQL), which includes Adaptive Bias Elimination and Dynamic Execution
Adjustment, aims to improve performance while minimizing resource expenditure
with zero-shot prompts. Specifically, SEA-SQL employs a semantic-enhanced
schema to augment database information and optimize SQL queries. During the SQL
query generation, a fine-tuned adaptive bias eliminator is applied to mitigate
inherent biases caused by the LLM. The dynamic execution adjustment is utilized
to guarantee the executability of the bias eliminated SQL query. We conduct
experiments on the Spider and BIRD datasets to demonstrate the effectiveness of
this framework. The results demonstrate that SEA-SQL achieves state-of-the-art
performance in the GPT3.5 scenario with 9%-58% of the generation cost.
Furthermore, SEA-SQL is comparable to GPT-4 with only 0.9%-5.3% of the
generation cost.