Evaluation Understudy for Dialogue Coherence Models

Sudeep Gandhe, D. Traum
{"title":"Evaluation Understudy for Dialogue Coherence Models","authors":"Sudeep Gandhe, D. Traum","doi":"10.3115/1622064.1622098","DOIUrl":null,"url":null,"abstract":"Evaluating a dialogue system is seen as a major challenge within the dialogue research community. Due to the very nature of the task, most of the evaluation methods need a substantial amount of human involvement. Following the tradition in machine translation, summarization and discourse coherence modeling, we introduce the the idea of evaluation understudy for dialogue coherence models. Following (Lapata, 2006), we use the information ordering task as a testbed for evaluating dialogue coherence models. This paper reports findings about the reliability of the information ordering task as applied to dialogues. We find that simple n-gram co-occurrence statistics similar in spirit to BLEU (Papineni et al., 2001) correlate very well with human judgments for dialogue coherence","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGDIAL Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1622064.1622098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Evaluating a dialogue system is seen as a major challenge within the dialogue research community. Due to the very nature of the task, most of the evaluation methods need a substantial amount of human involvement. Following the tradition in machine translation, summarization and discourse coherence modeling, we introduce the the idea of evaluation understudy for dialogue coherence models. Following (Lapata, 2006), we use the information ordering task as a testbed for evaluating dialogue coherence models. This paper reports findings about the reliability of the information ordering task as applied to dialogues. We find that simple n-gram co-occurrence statistics similar in spirit to BLEU (Papineni et al., 2001) correlate very well with human judgments for dialogue coherence
对话连贯模型的评价替补
评估对话系统被视为对话研究界的一个主要挑战。由于任务的性质,大多数评估方法需要大量的人力参与。继机器翻译、摘要和语篇连贯建模的传统之后,我们引入了评价替代的思想来建立对话连贯模型。接下来(Lapata, 2006),我们使用信息排序任务作为评估对话一致性模型的测试平台。本文报道了应用于对话的信息排序任务的可靠性研究结果。我们发现,在精神上与BLEU相似的简单n-gram共现统计(Papineni et al., 2001)与人类对对话连贯性的判断非常相关
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信