将计算语言学与句子嵌入相结合，创建零镜头 NLIDB

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2024-10-24 DOI:10.1016/j.array.2024.100368

Yuriy Perezhohin , Fernando Peres , Mauro Castelli

{"title":"将计算语言学与句子嵌入相结合，创建零镜头 NLIDB","authors":"Yuriy Perezhohin , Fernando Peres , Mauro Castelli","doi":"10.1016/j.array.2024.100368","DOIUrl":null,"url":null,"abstract":"<div><div>Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"24 ","pages":"Article 100368"},"PeriodicalIF":2.3000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining computational linguistics with sentence embedding to create a zero-shot NLIDB\",\"authors\":\"Yuriy Perezhohin , Fernando Peres , Mauro Castelli\",\"doi\":\"10.1016/j.array.2024.100368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"24 \",\"pages\":\"Article 100368\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005624000341\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005624000341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

使用自然语言访问关系数据库是一项极具挑战性的任务，现有的方法往往存在领域通用性差和计算成本高等问题。在本研究中，我们提出了一种新颖的方法，它省去了训练阶段，同时提供了跨领域的高适应性。我们的方法结合了结构化语言规则、精心策划的词汇表和预训练的嵌入模型，可将自然语言查询准确地翻译成 SQL。SPIDER 基准的实验结果证明了我们方法的有效性，在保持领域灵活性的同时，训练集的执行准确率为 72.03%，开发集的执行准确率为 70.83%。此外，所提出的系统在开发集上的表现比两个经过广泛训练的模型高出 28.33%，证明了它的高效性。这项研究极大地推动了数据库自然语言接口（NLIDB）的发展，为从普通语言输入生成准确的 SQL 查询提供了一种资源节约型替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Array Computer Science-General Computer Science

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

45 days