Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang
{"title":"使用大型语言模型的 NL2SQL 调查:我们在哪里,我们要去哪里?","authors":"Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang","doi":"arxiv-2408.05109","DOIUrl":null,"url":null,"abstract":"Translating users' natural language queries (NL) into SQL queries (i.e.,\nNL2SQL) can significantly reduce barriers to accessing relational databases and\nsupport various commercial applications. The performance of NL2SQL has been\ngreatly enhanced with the emergence of Large Language Models (LLMs). In this\nsurvey, we provide a comprehensive review of NL2SQL techniques powered by LLMs,\ncovering its entire lifecycle from the following four aspects: (1) Model:\nNL2SQL translation techniques that tackle not only NL ambiguity and\nunder-specification, but also properly map NL with database schema and\ninstances; (2) Data: From the collection of training data, data synthesis due\nto training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating\nNL2SQL methods from multiple angles using different metrics and granularities;\nand (4) Error Analysis: analyzing NL2SQL errors to find the root cause and\nguiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for\ndeveloping NL2SQL solutions. Finally, we discuss the research challenges and\nopen problems of NL2SQL in the LLMs era.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?\",\"authors\":\"Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang\",\"doi\":\"arxiv-2408.05109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Translating users' natural language queries (NL) into SQL queries (i.e.,\\nNL2SQL) can significantly reduce barriers to accessing relational databases and\\nsupport various commercial applications. The performance of NL2SQL has been\\ngreatly enhanced with the emergence of Large Language Models (LLMs). In this\\nsurvey, we provide a comprehensive review of NL2SQL techniques powered by LLMs,\\ncovering its entire lifecycle from the following four aspects: (1) Model:\\nNL2SQL translation techniques that tackle not only NL ambiguity and\\nunder-specification, but also properly map NL with database schema and\\ninstances; (2) Data: From the collection of training data, data synthesis due\\nto training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating\\nNL2SQL methods from multiple angles using different metrics and granularities;\\nand (4) Error Analysis: analyzing NL2SQL errors to find the root cause and\\nguiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for\\ndeveloping NL2SQL solutions. Finally, we discuss the research challenges and\\nopen problems of NL2SQL in the LLMs era.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.05109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
Translating users' natural language queries (NL) into SQL queries (i.e.,
NL2SQL) can significantly reduce barriers to accessing relational databases and
support various commercial applications. The performance of NL2SQL has been
greatly enhanced with the emergence of Large Language Models (LLMs). In this
survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs,
covering its entire lifecycle from the following four aspects: (1) Model:
NL2SQL translation techniques that tackle not only NL ambiguity and
under-specification, but also properly map NL with database schema and
instances; (2) Data: From the collection of training data, data synthesis due
to training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating
NL2SQL methods from multiple angles using different metrics and granularities;
and (4) Error Analysis: analyzing NL2SQL errors to find the root cause and
guiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for
developing NL2SQL solutions. Finally, we discuss the research challenges and
open problems of NL2SQL in the LLMs era.