Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality

Pei Zhou, Hyundong Justin Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, J. Pujara, Xiang Ren
{"title":"Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality","authors":"Pei Zhou, Hyundong Justin Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, J. Pujara, Xiang Ren","doi":"10.48550/arXiv.2211.09267","DOIUrl":null,"url":null,"abstract":"Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data is rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to “think” about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"86 1","pages":"10450-10468"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.09267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data is rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to “think” about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.
反思,而不是反射:基于推理的共同点提高对话反应质量
人类交流依赖于共同基础(CG),即参与者共享的共同知识和信念,从而产生连贯而有趣的对话。在本文中,我们证明了当前的响应生成(RG)模型在对话中产生通用和沉闷的响应,因为它们是反射性的,由于训练数据中缺乏CG和标准的RG训练程序,无法明确地建模CG。我们介绍了Reflect,这是一个数据集,它用明确的CG(物化为近似共享知识和信念的推论)注释对话,并征求9k个不同的人类生成的响应,每个响应都遵循一个共同点。使用Reflect,我们展示了当前对话数据和RG模型的局限性:当前数据中不到一半的响应被评为高质量(合理、具体和有趣),使用这些数据训练的模型质量更低,而大多数Reflect响应被认为是高质量的。接下来,我们通过使用Reflect CG来指导RG模型,分析CG是否可以帮助模型产生更好的质量响应。令人惊讶的是,我们发现仅仅促使GPT3“思考”CG就能多产生30%的质量响应,这表明将CG整合到RG过程中有很大的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信