CM-SQL: A cross-model consistency framework for text-to-SQL

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-10-04 DOI:10.1016/j.neucom.2025.131708

Xiang Li , Jinguo You , Heng Li , Jun Peng , Xi Chen , Ziheng Guo

{"title":"CM-SQL: A cross-model consistency framework for text-to-SQL","authors":"Xiang Li , Jinguo You , Heng Li , Jun Peng , Xi Chen , Ziheng Guo","doi":"10.1016/j.neucom.2025.131708","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, large language models (LLMs) have been widely applied to the task of Text-to-SQL. Currently, most LLM-based Text-to-SQL methods primarily adopt the following approaches to improve the accuracy of generated SQL: (1) schema linking; and (2) leveraging the model’s self-consistency to check, modify, and select the generated SQL. However, due to issues such as hallucinations in LLMs, the database schema generated during the schema linking phase may contain errors or omissions. On the other hand, LLMs often exhibit overconfidence when evaluating the correctness of their outputs. To address these issues, we propose a cross-model consistency SQL generation framework (CM-SQL), which generates SQL outputs from different perspectives by feeding two database schemas into two LLMs. The framework combines the stability of fine-tuned models with the powerful reasoning capabilities of LLMs to evaluate the generated SQL. Additionally, we propose a local modification strategy to correct erroneous SQL. Finally, the outputs of the evaluation module and the LLM are used to select candidate SQLs, yielding the final SQL. We evaluated the proposed framework on the BIRD dev dataset using GPT-4o-mini and DeepSeek-V2.5, achieving an execution accuracy of 65.65 %. On the test set of the Spider dataset, the execution accuracy reached 87.6%, significantly outperforming most methods based on the same LLMs. Furthermore, our performance is comparable to many approaches that rely on more expensive models, such as GPT-4.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131708"},"PeriodicalIF":6.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122502380X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, large language models (LLMs) have been widely applied to the task of Text-to-SQL. Currently, most LLM-based Text-to-SQL methods primarily adopt the following approaches to improve the accuracy of generated SQL: (1) schema linking; and (2) leveraging the model’s self-consistency to check, modify, and select the generated SQL. However, due to issues such as hallucinations in LLMs, the database schema generated during the schema linking phase may contain errors or omissions. On the other hand, LLMs often exhibit overconfidence when evaluating the correctness of their outputs. To address these issues, we propose a cross-model consistency SQL generation framework (CM-SQL), which generates SQL outputs from different perspectives by feeding two database schemas into two LLMs. The framework combines the stability of fine-tuned models with the powerful reasoning capabilities of LLMs to evaluate the generated SQL. Additionally, we propose a local modification strategy to correct erroneous SQL. Finally, the outputs of the evaluation module and the LLM are used to select candidate SQLs, yielding the final SQL. We evaluated the proposed framework on the BIRD dev dataset using GPT-4o-mini and DeepSeek-V2.5, achieving an execution accuracy of 65.65 %. On the test set of the Spider dataset, the execution accuracy reached 87.6%, significantly outperforming most methods based on the same LLMs. Furthermore, our performance is comparable to many approaches that rely on more expensive models, such as GPT-4.

查看原文本刊更多论文

CM-SQL：文本到sql的跨模型一致性框架

近年来，大型语言模型（llm）被广泛应用于文本到sql的任务。目前，大多数基于llm的Text-to-SQL方法主要采用以下方法来提高生成SQL的准确性：(1)模式链接；(2)利用模型的自一致性来检查、修改和选择生成的SQL。但是，由于llm中的幻觉等问题，在模式链接阶段生成的数据库模式可能包含错误或遗漏。另一方面，法学硕士在评估其输出的正确性时往往表现出过度自信。为了解决这些问题，我们提出了一个跨模型一致性SQL生成框架（CM-SQL），它通过向两个llm提供两个数据库模式来从不同的角度生成SQL输出。该框架将微调模型的稳定性与llm的强大推理能力相结合，以评估生成的SQL。此外，我们还提出了一种本地修改策略来纠正错误的SQL。最后，使用评估模块和LLM的输出来选择候选SQL，从而生成最终的SQL。我们使用gpt - 40 -mini和DeepSeek-V2.5在BIRD开发数据集上评估了所提出的框架，实现了65.65%的执行精度。在Spider数据集的测试集上，执行准确率达到87.6%，明显优于基于相同llm的大多数方法。此外，我们的性能可以与许多依赖于更昂贵模型的方法相媲美，例如GPT-4。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.