基于逆Kullback-Leibler和动态α调度器的大型语言模型提取与结构化查询语言生成

IF 4.9 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2025-08-22 DOI:10.1016/j.compeleceng.2025.110607

Nhat Le , Quan Ninh , Tung Le , Huy Tien Nguyen

{"title":"基于逆Kullback-Leibler和动态α调度器的大型语言模型提取与结构化查询语言生成","authors":"Nhat Le , Quan Ninh , Tung Le , Huy Tien Nguyen","doi":"10.1016/j.compeleceng.2025.110607","DOIUrl":null,"url":null,"abstract":"<div><div>With the increasing reliance on data-driven decision-making, the demand for efficient Structured Query Language (SQL) query generation has grown significantly, as it serves as a crucial bridge between natural language and databases. While Large Language Models (LLMs) excel in this task, their high computational costs and inconsistent effectiveness pose significant limitations. This study introduces a knowledge distillation approach to create efficient, high-performing models for SQL generation. By integrating teacher and student model distributions with a dynamic <span><math><mi>α</mi></math></span> scheduler inspired by learning rate schedulers, the method adjusts the teacher’s influence during training, enhancing stability and narrowing performance gaps. Additionally, reverse Kullback–Leibler Divergence (KLD) loss balances contributions, allowing the student model to refine itself while leveraging teacher guidance. The resulting distilled student model, which is 100 times smaller than GPT-4, achieves 80.5% accuracy on benchmark datasets and outperforms GPT-4 in this domain. Furthermore, it demonstrates a 10.2% improvement on extra-hard questions compared to its undistilled counterpart. This work highlights the potential of optimizing LLMs for reduced computational costs and superior performance in SQL query generation and beyond.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"127 ","pages":"Article 110607"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distilling Large Language Models for Structured Query Language generation with reverse Kullback–Leibler and dynamic α scheduler\",\"authors\":\"Nhat Le , Quan Ninh , Tung Le , Huy Tien Nguyen\",\"doi\":\"10.1016/j.compeleceng.2025.110607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the increasing reliance on data-driven decision-making, the demand for efficient Structured Query Language (SQL) query generation has grown significantly, as it serves as a crucial bridge between natural language and databases. While Large Language Models (LLMs) excel in this task, their high computational costs and inconsistent effectiveness pose significant limitations. This study introduces a knowledge distillation approach to create efficient, high-performing models for SQL generation. By integrating teacher and student model distributions with a dynamic <span><math><mi>α</mi></math></span> scheduler inspired by learning rate schedulers, the method adjusts the teacher’s influence during training, enhancing stability and narrowing performance gaps. Additionally, reverse Kullback–Leibler Divergence (KLD) loss balances contributions, allowing the student model to refine itself while leveraging teacher guidance. The resulting distilled student model, which is 100 times smaller than GPT-4, achieves 80.5% accuracy on benchmark datasets and outperforms GPT-4 in this domain. Furthermore, it demonstrates a 10.2% improvement on extra-hard questions compared to its undistilled counterpart. This work highlights the potential of optimizing LLMs for reduced computational costs and superior performance in SQL query generation and beyond.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"127 \",\"pages\":\"Article 110607\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625005506\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625005506","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

随着对数据驱动决策的日益依赖，对高效的结构化查询语言（SQL）查询生成的需求显著增长，因为它是自然语言和数据库之间的重要桥梁。虽然大型语言模型（llm）在这项任务中表现出色，但它们的高计算成本和不一致的有效性构成了显著的限制。本研究引入了一种知识蒸馏方法来创建高效、高性能的SQL生成模型。该方法通过将教师和学生模型分布与受学习率调度启发的动态α调度相结合，调整教师在训练过程中的影响，增强稳定性并缩小性能差距。此外，反向Kullback-Leibler散度（KLD）损失平衡了贡献，允许学生模型在利用教师指导的同时自我完善。得到的提炼学生模型比GPT-4小100倍，在基准数据集上达到80.5%的准确率，在该领域优于GPT-4。此外，与未蒸馏的对应物相比，它在超难问题上的表现提高了10.2%。这项工作突出了优化llm的潜力，以降低计算成本，并在SQL查询生成和其他方面获得卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distilling Large Language Models for Structured Query Language generation with reverse Kullback–Leibler and dynamic α scheduler

With the increasing reliance on data-driven decision-making, the demand for efficient Structured Query Language (SQL) query generation has grown significantly, as it serves as a crucial bridge between natural language and databases. While Large Language Models (LLMs) excel in this task, their high computational costs and inconsistent effectiveness pose significant limitations. This study introduces a knowledge distillation approach to create efficient, high-performing models for SQL generation. By integrating teacher and student model distributions with a dynamic

α

scheduler inspired by learning rate schedulers, the method adjusts the teacher’s influence during training, enhancing stability and narrowing performance gaps. Additionally, reverse Kullback–Leibler Divergence (KLD) loss balances contributions, allowing the student model to refine itself while leveraging teacher guidance. The resulting distilled student model, which is 100 times smaller than GPT-4, achieves 80.5% accuracy on benchmark datasets and outperforms GPT-4 in this domain. Furthermore, it demonstrates a 10.2% improvement on extra-hard questions compared to its undistilled counterpart. This work highlights the potential of optimizing LLMs for reduced computational costs and superior performance in SQL query generation and beyond.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.