On the Robustness of Dialogue History Representation in Conversational Question Answering: A Comprehensive Study and a New Prompt-based Method

IF 4.2 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics Pub Date : 2022-06-29 DOI:10.1162/tacl_a_00549

Zorik Gekhman, Nadav Oved, Orgad Keller, Idan Szpektor, Roi Reichart

{"title":"On the Robustness of Dialogue History Representation in Conversational Question Answering: A Comprehensive Study and a New Prompt-based Method","authors":"Zorik Gekhman, Nadav Oved, Orgad Keller, Idan Szpektor, Roi Reichart","doi":"10.1162/tacl_a_00549","DOIUrl":null,"url":null,"abstract":"Most work on modeling the conversation history in Conversational Question Answering (CQA) reports a single main result on a common CQA benchmark. While existing models show impressive results on CQA leaderboards, it remains unclear whether they are robust to shifts in setting (sometimes to more realistic ones), training data size (e.g., from large to small sets) and domain. In this work, we design and conduct the first large-scale robustness study of history modeling approaches for CQA. We find that high benchmark scores do not necessarily translate to strong robustness, and that various methods can perform extremely differently under different settings. Equipped with the insights from our study, we design a novel prompt-based history modeling approach and demonstrate its strong robustness across various settings. Our approach is inspired by existing methods that highlight historic answers in the passage. However, instead of highlighting by modifying the passage token embeddings, we add textual prompts directly in the passage text. Our approach is simple, easy to plug into practically any model, and highly effective, thus we recommend it as a starting point for future model developers. We also hope that our study and insights will raise awareness to the importance of robustness-focused evaluation, in addition to obtaining high leaderboard scores, leading to better CQA systems.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"351-366"},"PeriodicalIF":4.2000,"publicationDate":"2022-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00549","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Most work on modeling the conversation history in Conversational Question Answering (CQA) reports a single main result on a common CQA benchmark. While existing models show impressive results on CQA leaderboards, it remains unclear whether they are robust to shifts in setting (sometimes to more realistic ones), training data size (e.g., from large to small sets) and domain. In this work, we design and conduct the first large-scale robustness study of history modeling approaches for CQA. We find that high benchmark scores do not necessarily translate to strong robustness, and that various methods can perform extremely differently under different settings. Equipped with the insights from our study, we design a novel prompt-based history modeling approach and demonstrate its strong robustness across various settings. Our approach is inspired by existing methods that highlight historic answers in the passage. However, instead of highlighting by modifying the passage token embeddings, we add textual prompts directly in the passage text. Our approach is simple, easy to plug into practically any model, and highly effective, thus we recommend it as a starting point for future model developers. We also hope that our study and insights will raise awareness to the importance of robustness-focused evaluation, in addition to obtaining high leaderboard scores, leading to better CQA systems.1

查看原文本刊更多论文

会话历史表征在会话问答中的稳健性研究——一种基于提示的新方法

在会话问答（CQA）中，大多数关于会话历史建模的工作都报告了一个通用CQA基准的单一主要结果。虽然现有模型在CQA排行榜上显示出了令人印象深刻的结果，但尚不清楚它们是否对设置（有时是更现实的设置）、训练数据大小（例如，从大集合到小集合）和领域的变化具有鲁棒性。在这项工作中，我们设计并进行了第一次大规模的CQA历史建模方法的稳健性研究。我们发现，高基准分数并不一定意味着强大的稳健性，而且在不同的设置下，各种方法的表现可能截然不同。根据我们研究的见解，我们设计了一种新颖的基于提示的历史建模方法，并在各种环境中证明了其强大的稳健性。我们的方法受到现有方法的启发，这些方法突出了文章中的历史答案。然而，我们没有通过修改段落标记嵌入来突出显示，而是直接在段落文本中添加文本提示。我们的方法简单，易于插入到几乎任何模型中，并且非常有效，因此我们建议将其作为未来模型开发人员的起点。我们还希望，我们的研究和见解将提高人们对以稳健性为重点的评估的重要性的认识，以及获得高排行榜分数，从而形成更好的CQA系统。1

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transactions of the Association for Computational Linguistics Multiple-

CiteScore

32.60

自引率

4.60%

发文量

审稿时长

8 weeks

期刊介绍： The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.