Deep Learning Based Text to SQL Conversion on WikiSQL Dataset: Comparative Analysis

2022 3rd International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) Pub Date : 2022-11-11 DOI:10.1109/ICICT55121.2022.10064556

Kushal Lahoti, Muskan Paryani, A. Patil

{"title":"Deep Learning Based Text to SQL Conversion on WikiSQL Dataset: Comparative Analysis","authors":"Kushal Lahoti, Muskan Paryani, A. Patil","doi":"10.1109/ICICT55121.2022.10064556","DOIUrl":null,"url":null,"abstract":"Due to tremendous expansion in the banking, business, and IT sectors, data is growing at an exponential rate these days, and the majority of it is stored in relational databases. In order to access and alter information, users must be proficient with languages such as SQL (Structured Query Language). However, the customers are unfamiliar with SQL. As a result, a number of different models have been presented to date to convert English queries into SQL for extracting data from databases to achieve human performance on the dataset. It is an approach toward developing an end-to-end solution that may provide an intuitive text interface for many data sources within the corporate ecosystem, allowing them to obtain real-time data visualization or insights. Some of the models developed to date have already surpassed the 90 % test accuracy on the WikiSQL dataset. However, more research is needed to develop a largely scalable model with higher accuracy on custom and unknown databases. The purpose of this paper is to shed light on different strategies involved in this domain and to understand the contrast between them. We mainly focus on those factors and aspects that impact the performance and accuracy of the task significantly. This paper presents comprehensive research on the most prominent algorithms and analyses them based on different factors.","PeriodicalId":181396,"journal":{"name":"2022 3rd International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT55121.2022.10064556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Due to tremendous expansion in the banking, business, and IT sectors, data is growing at an exponential rate these days, and the majority of it is stored in relational databases. In order to access and alter information, users must be proficient with languages such as SQL (Structured Query Language). However, the customers are unfamiliar with SQL. As a result, a number of different models have been presented to date to convert English queries into SQL for extracting data from databases to achieve human performance on the dataset. It is an approach toward developing an end-to-end solution that may provide an intuitive text interface for many data sources within the corporate ecosystem, allowing them to obtain real-time data visualization or insights. Some of the models developed to date have already surpassed the 90 % test accuracy on the WikiSQL dataset. However, more research is needed to develop a largely scalable model with higher accuracy on custom and unknown databases. The purpose of this paper is to shed light on different strategies involved in this domain and to understand the contrast between them. We mainly focus on those factors and aspects that impact the performance and accuracy of the task significantly. This paper presents comprehensive research on the most prominent algorithms and analyses them based on different factors.

查看原文本刊更多论文

基于深度学习的WikiSQL数据集文本到SQL的转换:比较分析

由于银行、商业和IT部门的巨大扩张，数据正在以指数级的速度增长，其中大部分存储在关系数据库中。为了访问和修改信息，用户必须精通SQL(结构化查询语言)等语言。但是，客户不熟悉SQL。因此，迄今为止已经提出了许多不同的模型，将英语查询转换为SQL，以便从数据库中提取数据，从而在数据集上实现人工性能。它是一种开发端到端解决方案的方法，可以为企业生态系统中的许多数据源提供直观的文本界面，使它们能够获得实时数据可视化或见解。迄今为止开发的一些模型在WikiSQL数据集上的测试准确率已经超过了90%。然而，在定制和未知数据库上开发具有更高精度的大规模可扩展模型需要更多的研究。本文的目的是阐明这一领域中涉及的不同策略，并了解它们之间的对比。我们主要关注那些显著影响任务性能和准确性的因素和方面。本文对最突出的算法进行了综合研究，并基于不同因素对其进行了分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 3rd International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)

自引率

0.00%

发文量