Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2024-10-24 DOI:10.1016/j.array.2024.100368

Yuriy Perezhohin , Fernando Peres , Mauro Castelli

引用次数: 0

Abstract

Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.

查看原文本刊更多论文

将计算语言学与句子嵌入相结合，创建零镜头 NLIDB

使用自然语言访问关系数据库是一项极具挑战性的任务，现有的方法往往存在领域通用性差和计算成本高等问题。在本研究中，我们提出了一种新颖的方法，它省去了训练阶段，同时提供了跨领域的高适应性。我们的方法结合了结构化语言规则、精心策划的词汇表和预训练的嵌入模型，可将自然语言查询准确地翻译成 SQL。SPIDER 基准的实验结果证明了我们方法的有效性，在保持领域灵活性的同时，训练集的执行准确率为 72.03%，开发集的执行准确率为 70.83%。此外，所提出的系统在开发集上的表现比两个经过广泛训练的模型高出 28.33%，证明了它的高效性。这项研究极大地推动了数据库自然语言接口（NLIDB）的发展，为从普通语言输入生成准确的 SQL 查询提供了一种资源节约型替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊