Interactive Evaluation of Conversational Agents: Reflections on the Impact of Search Task Design

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval Pub Date : 2020-09-14 DOI:10.1145/3409256.3409814

Mateusz Dubiel, Martin Halvey, L. Azzopardi, Sylvain Daronnat

{"title":"Interactive Evaluation of Conversational Agents: Reflections on the Impact of Search Task Design","authors":"Mateusz Dubiel, Martin Halvey, L. Azzopardi, Sylvain Daronnat","doi":"10.1145/3409256.3409814","DOIUrl":null,"url":null,"abstract":"Undertaking an interactive evaluation of goal-oriented conversational agents (CAs) is challenging, it requires the search task to be realistic and relatable while accounting for the users cognitive limitations. In the current paper we discuss findings of two Wizard of Oz studies and provide our reflections regarding the impact of different interactive search task designs on participants? performance, satisfaction and cognitive workload. In the first study, we tasked participants with finding a cheapest flight that met a certain departure time. In the second study we added an additional criterion: \"travel time\" and asked participants to find a fight option that offered a good trade-off between price and travel time. We found that using search tasks where participants need to decide between several competing search criteria (price vs. time) led to a higher search involvement and lower variance in usability and cognitive workload ratings between different CAs. We hope that our results will provoke discussion on how to make the evaluation of voice-only goal-oriented CAs more reliable and ecologically valid.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"IA-20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409256.3409814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Undertaking an interactive evaluation of goal-oriented conversational agents (CAs) is challenging, it requires the search task to be realistic and relatable while accounting for the users cognitive limitations. In the current paper we discuss findings of two Wizard of Oz studies and provide our reflections regarding the impact of different interactive search task designs on participants? performance, satisfaction and cognitive workload. In the first study, we tasked participants with finding a cheapest flight that met a certain departure time. In the second study we added an additional criterion: "travel time" and asked participants to find a fight option that offered a good trade-off between price and travel time. We found that using search tasks where participants need to decide between several competing search criteria (price vs. time) led to a higher search involvement and lower variance in usability and cognitive workload ratings between different CAs. We hope that our results will provoke discussion on how to make the evaluation of voice-only goal-oriented CAs more reliable and ecologically valid.

查看原文本刊更多论文

会话代理的交互评价:对搜索任务设计影响的思考

对面向目标的会话代理(ca)进行交互式评估是具有挑战性的，它要求搜索任务具有现实性和相关性，同时考虑到用户的认知限制。在本文中，我们讨论了两项《绿野仙踪》研究的结果，并就不同的交互式搜索任务设计对参与者的影响提供了我们的思考。绩效、满意度和认知工作量。在第一项研究中，我们要求参与者找到符合特定起飞时间的最便宜航班。在第二项研究中，我们增加了一个额外的标准:“旅行时间”，并要求参与者找到一个在价格和旅行时间之间提供良好权衡的航班选择。我们发现，使用搜索任务，参与者需要在几个竞争的搜索标准(价格与时间)之间做出决定，导致更高的搜索参与和更低的可用性差异和不同ca之间的认知工作量评级。我们希望我们的研究结果能够引发关于如何使语音目标导向CAs的评估更加可靠和生态有效的讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

自引率

0.00%

发文量