SCoT2S: Self-correcting Text-to-SQL parsing by leveraging LLMs

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-07-31 DOI:10.1016/j.csl.2025.101865

Chunlin Zhu , Yuming Lin , Yaojun Cai , You Li

{"title":"SCoT2S: Self-correcting Text-to-SQL parsing by leveraging LLMs","authors":"Chunlin Zhu , Yuming Lin , Yaojun Cai , You Li","doi":"10.1016/j.csl.2025.101865","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-SQL parsing, which converts natural language questions into executable SQL queries, has emerged as a critical technology for enabling non-technical users to interact with databases effectively. Although recent advances in this field have shown promise, existing models still struggle with complex semantic understanding and accurate SQL generation, particularly in handling schema relationships and join operations. To address these challenges, we propose SCoT2S (Self-Correcting Text-to-SQL), a novel framework that leverages large language models to automatically identify and rectify errors in SQL query generation. Through systematic error analysis of existing Text-to-SQL models, we identify that schema linking and join operations account for more than 70% of parsing errors. Our SCoT2S framework addresses these issues through a three-stage approach: initial SQL generation, comprehensive error detection, and targeted correction using large language models. This approach enables real-time error identification and correction during the parsing process. Extensive experiments demonstrate the effectiveness of the proposed SCoT2S in the Spider benchmark data set. Specifically, SCoT2S shows significant improvements, with a 2.8% increase in EM scores and a 4.0% increase in EX scores compared to current state-of-the-art methods.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101865"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000907","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Text-to-SQL parsing, which converts natural language questions into executable SQL queries, has emerged as a critical technology for enabling non-technical users to interact with databases effectively. Although recent advances in this field have shown promise, existing models still struggle with complex semantic understanding and accurate SQL generation, particularly in handling schema relationships and join operations. To address these challenges, we propose SCoT2S (Self-Correcting Text-to-SQL), a novel framework that leverages large language models to automatically identify and rectify errors in SQL query generation. Through systematic error analysis of existing Text-to-SQL models, we identify that schema linking and join operations account for more than 70% of parsing errors. Our SCoT2S framework addresses these issues through a three-stage approach: initial SQL generation, comprehensive error detection, and targeted correction using large language models. This approach enables real-time error identification and correction during the parsing process. Extensive experiments demonstrate the effectiveness of the proposed SCoT2S in the Spider benchmark data set. Specifically, SCoT2S shows significant improvements, with a 2.8% increase in EM scores and a 4.0% increase in EX scores compared to current state-of-the-art methods.

查看原文本刊更多论文

SCoT2S：通过利用llm自动纠正文本到sql的解析

文本到SQL解析将自然语言问题转换为可执行的SQL查询，它已成为使非技术用户能够有效地与数据库交互的一项关键技术。尽管该领域的最新进展显示出了希望，但现有模型仍然难以理解复杂的语义和精确的SQL生成，特别是在处理模式关系和连接操作方面。为了应对这些挑战，我们提出了SCoT2S（文本到SQL的自我纠正），这是一个利用大型语言模型来自动识别和纠正SQL查询生成中的错误的新框架。通过对现有Text-to-SQL模型的系统错误分析，我们发现模式链接和连接操作占解析错误的70%以上。我们的SCoT2S框架通过三个阶段的方法来解决这些问题：初始SQL生成、全面的错误检测和使用大型语言模型的有针对性的纠正。这种方法支持在解析过程中实时识别和纠正错误。大量的实验证明了所提出的SCoT2S在Spider基准数据集中的有效性。具体来说，与目前最先进的方法相比，SCoT2S显示出显着的改进，EM分数提高2.8%，EX分数提高4.0%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.