{"title":"基于检索增强生成(RAG)的Text2SQL商业智能系统","authors":"Jie Liu, Shiwei Chu","doi":"10.1002/eng2.70249","DOIUrl":null,"url":null,"abstract":"<p>Modern enterprises increasingly depend on data-driven decision-making, yet traditional SQL queries require technical expertise, limiting accessibility for nonspecialists. Advances in natural language processing, particularly deep learning generative models, have enabled text-to-SQL (text2SQL) conversion, making database interaction more intuitive. Retrieval-Augmented Generation (RAG) enhances this by integrating retrieval and generation for greater accuracy and relevance. This article proposes a text2SQL business intelligence system based on RAG, allowing enterprise users to extract insights from complex databases using natural language queries. By streamlining data retrieval and lowering technical barriers, the system achieves state-of-the-art performance in generating SQL queries for complex tasks. It leverages the BERT (Bidirectional Encoder Representations from Transformers) model for vectorized retrieval, Generative Pretrained Transformer 4 (GPT-4) for query generation, and Graph Neural Networks (GNNs) for modeling database structures. User interaction and feedback mechanisms further refine semantic understanding and query accuracy. Experimental results demonstrate the system's effectiveness. For multitable joins, query matching accuracy using BERT + GPT-4 + GNN reaches 52.3% and 55.1% with beam widths of 1 and 10, respectively. For nested queries involving multitable joins, accuracy increases to 60.2% and 61.9% under the same conditions. Additionally, the system achieves the highest user satisfaction scores, validating its practical utility. By enhancing the ability to handle complex queries and reducing data access barriers, the proposed RAG-based text2SQL system provides enterprise users with an efficient, user-friendly tool for database interaction, significantly improving decision-making processes.</p>","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":"7 6","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.70249","citationCount":"0","resultStr":"{\"title\":\"Text2SQL Business Intelligence System Based on Retrieval-Augmented Generation (RAG)\",\"authors\":\"Jie Liu, Shiwei Chu\",\"doi\":\"10.1002/eng2.70249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Modern enterprises increasingly depend on data-driven decision-making, yet traditional SQL queries require technical expertise, limiting accessibility for nonspecialists. Advances in natural language processing, particularly deep learning generative models, have enabled text-to-SQL (text2SQL) conversion, making database interaction more intuitive. Retrieval-Augmented Generation (RAG) enhances this by integrating retrieval and generation for greater accuracy and relevance. This article proposes a text2SQL business intelligence system based on RAG, allowing enterprise users to extract insights from complex databases using natural language queries. By streamlining data retrieval and lowering technical barriers, the system achieves state-of-the-art performance in generating SQL queries for complex tasks. It leverages the BERT (Bidirectional Encoder Representations from Transformers) model for vectorized retrieval, Generative Pretrained Transformer 4 (GPT-4) for query generation, and Graph Neural Networks (GNNs) for modeling database structures. User interaction and feedback mechanisms further refine semantic understanding and query accuracy. Experimental results demonstrate the system's effectiveness. For multitable joins, query matching accuracy using BERT + GPT-4 + GNN reaches 52.3% and 55.1% with beam widths of 1 and 10, respectively. For nested queries involving multitable joins, accuracy increases to 60.2% and 61.9% under the same conditions. Additionally, the system achieves the highest user satisfaction scores, validating its practical utility. By enhancing the ability to handle complex queries and reducing data access barriers, the proposed RAG-based text2SQL system provides enterprise users with an efficient, user-friendly tool for database interaction, significantly improving decision-making processes.</p>\",\"PeriodicalId\":72922,\"journal\":{\"name\":\"Engineering reports : open access\",\"volume\":\"7 6\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.70249\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering reports : open access\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/eng2.70249\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.70249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Text2SQL Business Intelligence System Based on Retrieval-Augmented Generation (RAG)
Modern enterprises increasingly depend on data-driven decision-making, yet traditional SQL queries require technical expertise, limiting accessibility for nonspecialists. Advances in natural language processing, particularly deep learning generative models, have enabled text-to-SQL (text2SQL) conversion, making database interaction more intuitive. Retrieval-Augmented Generation (RAG) enhances this by integrating retrieval and generation for greater accuracy and relevance. This article proposes a text2SQL business intelligence system based on RAG, allowing enterprise users to extract insights from complex databases using natural language queries. By streamlining data retrieval and lowering technical barriers, the system achieves state-of-the-art performance in generating SQL queries for complex tasks. It leverages the BERT (Bidirectional Encoder Representations from Transformers) model for vectorized retrieval, Generative Pretrained Transformer 4 (GPT-4) for query generation, and Graph Neural Networks (GNNs) for modeling database structures. User interaction and feedback mechanisms further refine semantic understanding and query accuracy. Experimental results demonstrate the system's effectiveness. For multitable joins, query matching accuracy using BERT + GPT-4 + GNN reaches 52.3% and 55.1% with beam widths of 1 and 10, respectively. For nested queries involving multitable joins, accuracy increases to 60.2% and 61.9% under the same conditions. Additionally, the system achieves the highest user satisfaction scores, validating its practical utility. By enhancing the ability to handle complex queries and reducing data access barriers, the proposed RAG-based text2SQL system provides enterprise users with an efficient, user-friendly tool for database interaction, significantly improving decision-making processes.