RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-25 DOI:10.48550/arXiv.2306.14321

Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir R. Radev

引用次数: 4

Abstract

Despite significant progress having been made in question answering on tabular data (Table QA), it’s unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. To systematically study the robustness of Table QA models, we propose a benchmark called RobuT, which builds upon existing Table QA datasets (WTQ, WikiSQL-Weak, and SQA) and includes human-annotated adversarial perturbations in terms of table header, table content, and question. Our results indicate that both state-of-the-art Table QA models and large language models (e.g., GPT-3) with few-shot learning falter in these adversarial sets. We propose to address this problem by using large language models to generate adversarial examples to enhance training, which significantly improves the robustness of Table QA models.

查看原文本刊更多论文

表QA对人类注释对抗性扰动鲁棒性的系统研究

尽管在表格数据(表QA)上的问题回答方面取得了重大进展，但目前尚不清楚现有的表QA模型是否以及在多大程度上对特定任务的扰动具有鲁棒性，例如，替换关键问题实体或洗牌表列。为了系统地研究表QA模型的鲁棒性，我们提出了一个名为RobuT的基准，它建立在现有的表QA数据集(WTQ、WikiSQL-Weak和SQA)之上，并在表头、表内容和问题方面包括人工注释的对抗性扰动。我们的研究结果表明，最先进的表QA模型和具有少量学习的大型语言模型(例如GPT-3)在这些对抗集中都表现不佳。我们建议通过使用大型语言模型来生成对抗性示例来增强训练，从而显著提高表QA模型的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量