文本到sql系统的n -最佳假设重新排序

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-19 DOI:10.1109/SLT54892.2023.10023434

Lu Zeng, S. Parthasarathi, Dilek Z. Hakkani-Tür

{"title":"文本到sql系统的n -最佳假设重新排序","authors":"Lu Zeng, S. Parthasarathi, Dilek Z. Hakkani-Tür","doi":"10.1109/SLT54892.2023.10023434","DOIUrl":null,"url":null,"abstract":"Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a 7.7% absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent 1% improvement in EM accuracy, and a 2.5% improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"N-Best Hypotheses Reranking for Text-to-SQL Systems\",\"authors\":\"Lu Zeng, S. Parthasarathi, Dilek Z. Hakkani-Tür\",\"doi\":\"10.1109/SLT54892.2023.10023434\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a 7.7% absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent 1% improvement in EM accuracy, and a 2.5% improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.\",\"PeriodicalId\":352002,\"journal\":{\"name\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"144 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT54892.2023.10023434\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT54892.2023.10023434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

文本到sql任务将自然语言话语映射到可以发出到数据库的结构化查询。最先进的(SOTA)系统依赖于对大型预训练语言模型进行微调，并结合应用SQL解析器进行约束解码。在完善的Spider数据集上，我们从Oracle研究开始:具体地说，从SOTA模型的10个最佳列表中选择一个Oracle假设，在精确匹配(EM)和执行(EX)准确性方面都有7.7%的绝对提高，显示出通过重新排序有显著的潜在改进。将一致性和正确性作为重新排序的方法，设计了一个生成查询计划的模型，并提出了一种启发式模式链接算法。将这两种方法与T5-Large相结合，我们获得了1%的EM精度提高，2.5%的EX提高，为该任务建立了新的SOTA。我们对DEV数据的全面误差研究表明，在这项任务上取得进展存在潜在困难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

N-Best Hypotheses Reranking for Text-to-SQL Systems

Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a 7.7% absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent 1% improvement in EM accuracy, and a 2.5% improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量