Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Pub Date : 2022-11-16 DOI:10.48550/arXiv.2211.09267

Pei Zhou, Hyundong Justin Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, J. Pujara, Xiang Ren

{"title":"Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality","authors":"Pei Zhou, Hyundong Justin Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, J. Pujara, Xiang Ren","doi":"10.48550/arXiv.2211.09267","DOIUrl":null,"url":null,"abstract":"Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data is rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to “think” about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"86 1","pages":"10450-10468"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.09267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data is rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to “think” about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.

查看原文本刊更多论文

反思，而不是反射:基于推理的共同点提高对话反应质量

人类交流依赖于共同基础(CG)，即参与者共享的共同知识和信念，从而产生连贯而有趣的对话。在本文中，我们证明了当前的响应生成(RG)模型在对话中产生通用和沉闷的响应，因为它们是反射性的，由于训练数据中缺乏CG和标准的RG训练程序，无法明确地建模CG。我们介绍了Reflect，这是一个数据集，它用明确的CG(物化为近似共享知识和信念的推论)注释对话，并征求9k个不同的人类生成的响应，每个响应都遵循一个共同点。使用Reflect，我们展示了当前对话数据和RG模型的局限性:当前数据中不到一半的响应被评为高质量(合理、具体和有趣)，使用这些数据训练的模型质量更低，而大多数Reflect响应被认为是高质量的。接下来，我们通过使用Reflect CG来指导RG模型，分析CG是否可以帮助模型产生更好的质量响应。令人惊讶的是，我们发现仅仅促使GPT3“思考”CG就能多产生30%的质量响应，这表明将CG整合到RG过程中有很大的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

自引率

0.00%

发文量