Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?

Proceedings of the 31st European Conference on Cognitive Ergonomics Pub Date : 2019-09-10 DOI:10.1145/3335082.3335094

Samuel Holmes, A. Moorhead, R. Bond, Huiru Zheng, V. Coates, M. McTear

{"title":"Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?","authors":"Samuel Holmes, A. Moorhead, R. Bond, Huiru Zheng, V. Coates, M. McTear","doi":"10.1145/3335082.3335094","DOIUrl":null,"url":null,"abstract":"Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.","PeriodicalId":279162,"journal":{"name":"Proceedings of the 31st European Conference on Cognitive Ergonomics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"70","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st European Conference on Cognitive Ergonomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3335082.3335094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 70

Abstract

Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.

查看原文本刊更多论文

医疗聊天机器人的可用性测试:我们可以使用传统方法来评估会话用户界面吗?

聊天机器人作为人机界面正变得越来越受欢迎。通常应用于用户体验(UX)设计的传统最佳实践不能轻易应用于聊天机器人，传统的可用性测试技术也不能保证准确性。WeightMentor是一个定制的自我激励减肥维护工具。本研究解决了以下四个研究问题:根据传统的可用性方法，WeightMentor聊天机器人的可用性如何?在评估聊天机器人可用性时，不同的传统可用性问卷的相关性有多大?它们与量身定制的聊天机器人可用性调查得分有何关联?确定聊天机器人可用性问题所需的最佳用户数量是多少?第一次使用聊天机器人的用户需要重复多少次任务才能达到最佳任务性能(即基于任务完成时间的效率)?本文描述了测试WeightMentor聊天机器人的过程，评估了典型可用性测试指标之间的相关性，并建议通过参与者数量来识别可用性问题的传统智慧可能不适用于聊天机器人。研究设计是一项可用性研究。使用预先确定的可用性测试协议对WeightMentor进行测试，评估任务完成的难易程度、独特的可用性错误和参与者对聊天机器人的意见(使用可用性问卷收集)。WeightMentor可用性得分普遍较高，问卷之间的相关性较强。识别聊天机器人可用性错误的最佳用户数量是26，这对之前的研究提出了挑战。聊天机器人用户只需重复一次就能达到任务的最佳熟练程度。可用性测试的结果证实了人们对聊天机器人的了解——它们是高度可用的(由于它们简单的界面和对话驱动的功能)，但传统的评估可用性和用户体验的方法在应用于聊天机器人时可能不那么准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 31st European Conference on Cognitive Ergonomics

自引率

0.00%

发文量