Optimizing text-to-SQL conversion techniques through the integration of intelligent agents and large language models

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-04-27 DOI:10.1016/j.ipm.2025.104136

Samuel Ojuri , The Anh Han , Raymond Chiong , Alessandro Di Stefano

{"title":"Optimizing text-to-SQL conversion techniques through the integration of intelligent agents and large language models","authors":"Samuel Ojuri , The Anh Han , Raymond Chiong , Alessandro Di Stefano","doi":"10.1016/j.ipm.2025.104136","DOIUrl":null,"url":null,"abstract":"<div><div>In many organizations, retrieving valuable information from complex databases has traditionally required specialized technical skills, often leaving non-technical professionals dependent on others for timely insights. This study presents an approach that allows anyone, even without knowledge of query languages, to directly interact with databases by asking questions in everyday language. We achieve this by combining advanced generative language models, such as a high-capacity Generative Pre-trained Transformer (GPT) model, with intelligent software agents that translate natural language queries into precise SQL statements. Our evaluation compares different strategies, including models specifically trained on a particular database domain versus those guided by only a handful of examples. The results show that training a model with tailored examples yields more accurate and reliable database queries than relying solely on minimal guidance for the given use case. This work highlights the practical value of refining model complexity and balancing computational costs to empower business users with easy, direct access to data. By reducing reliance on technical teams, organizations can enable faster, more informed decision-making and foster a more inclusive environment where everyone can uncover data-driven insights on their own.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 5","pages":"Article 104136"},"PeriodicalIF":7.4000,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000780","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In many organizations, retrieving valuable information from complex databases has traditionally required specialized technical skills, often leaving non-technical professionals dependent on others for timely insights. This study presents an approach that allows anyone, even without knowledge of query languages, to directly interact with databases by asking questions in everyday language. We achieve this by combining advanced generative language models, such as a high-capacity Generative Pre-trained Transformer (GPT) model, with intelligent software agents that translate natural language queries into precise SQL statements. Our evaluation compares different strategies, including models specifically trained on a particular database domain versus those guided by only a handful of examples. The results show that training a model with tailored examples yields more accurate and reliable database queries than relying solely on minimal guidance for the given use case. This work highlights the practical value of refining model complexity and balancing computational costs to empower business users with easy, direct access to data. By reducing reliance on technical teams, organizations can enable faster, more informed decision-making and foster a more inclusive environment where everyone can uncover data-driven insights on their own.

查看原文本刊更多论文

通过集成智能代理和大型语言模型来优化文本到sql的转换技术

在许多组织中，从复杂的数据库中检索有价值的信息传统上需要专门的技术技能，这通常使非技术专业人员依赖于其他人来获得及时的见解。这项研究提出了一种方法，允许任何人，即使不了解查询语言，也可以通过用日常语言提问来直接与数据库交互。我们通过将高级生成语言模型（如高容量生成预训练转换器（GPT）模型）与智能软件代理（将自然语言查询转换为精确的SQL语句）相结合来实现这一目标。我们的评估比较了不同的策略，包括在特定数据库领域上专门训练的模型与仅由少数示例指导的模型。结果表明，与仅仅依赖于给定用例的最小指导相比，用定制的示例训练模型产生更准确和可靠的数据库查询。这项工作突出了细化模型复杂性和平衡计算成本的实用价值，从而使业务用户能够轻松、直接地访问数据。通过减少对技术团队的依赖，组织可以实现更快、更明智的决策，并营造一个更具包容性的环境，每个人都可以自己发现数据驱动的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.