{"title":"SCoT2S: Self-correcting Text-to-SQL parsing by leveraging LLMs","authors":"Chunlin Zhu , Yuming Lin , Yaojun Cai , You Li","doi":"10.1016/j.csl.2025.101865","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-SQL parsing, which converts natural language questions into executable SQL queries, has emerged as a critical technology for enabling non-technical users to interact with databases effectively. Although recent advances in this field have shown promise, existing models still struggle with complex semantic understanding and accurate SQL generation, particularly in handling schema relationships and join operations. To address these challenges, we propose SCoT2S (Self-Correcting Text-to-SQL), a novel framework that leverages large language models to automatically identify and rectify errors in SQL query generation. Through systematic error analysis of existing Text-to-SQL models, we identify that schema linking and join operations account for more than 70% of parsing errors. Our SCoT2S framework addresses these issues through a three-stage approach: initial SQL generation, comprehensive error detection, and targeted correction using large language models. This approach enables real-time error identification and correction during the parsing process. Extensive experiments demonstrate the effectiveness of the proposed SCoT2S in the Spider benchmark data set. Specifically, SCoT2S shows significant improvements, with a 2.8% increase in EM scores and a 4.0% increase in EX scores compared to current state-of-the-art methods.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101865"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000907","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text-to-SQL parsing, which converts natural language questions into executable SQL queries, has emerged as a critical technology for enabling non-technical users to interact with databases effectively. Although recent advances in this field have shown promise, existing models still struggle with complex semantic understanding and accurate SQL generation, particularly in handling schema relationships and join operations. To address these challenges, we propose SCoT2S (Self-Correcting Text-to-SQL), a novel framework that leverages large language models to automatically identify and rectify errors in SQL query generation. Through systematic error analysis of existing Text-to-SQL models, we identify that schema linking and join operations account for more than 70% of parsing errors. Our SCoT2S framework addresses these issues through a three-stage approach: initial SQL generation, comprehensive error detection, and targeted correction using large language models. This approach enables real-time error identification and correction during the parsing process. Extensive experiments demonstrate the effectiveness of the proposed SCoT2S in the Spider benchmark data set. Specifically, SCoT2S shows significant improvements, with a 2.8% increase in EM scores and a 4.0% increase in EX scores compared to current state-of-the-art methods.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.