Utility of Crowdsourced User Experiments for Measuring the Central Tendency of User Performance to Evaluate Error-Rate Models on GUIs

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing Pub Date : 2021-10-04 DOI:10.1609/hcomp.v9i1.18948

Shota Yamanaka

{"title":"Utility of Crowdsourced User Experiments for Measuring the Central Tendency of User Performance to Evaluate Error-Rate Models on GUIs","authors":"Shota Yamanaka","doi":"10.1609/hcomp.v9i1.18948","DOIUrl":null,"url":null,"abstract":"The usage of crowdsourcing to recruit numerous participants has been recognized as beneficial in the human-computer interaction (HCI) field, such as for designing user interfaces and validating user performance models.\nIn this work, we investigate its effectiveness for evaluating an error-rate prediction model in target pointing tasks.\nIn contrast to models for operational times, a clicking error (i.e., missing a target) occurs by chance at a certain probability, e.g., 5%.\nTherefore, in traditional laboratory-based experiments, a lot of repetitions are needed to measure the central tendency of error rates.\nWe hypothesize that recruiting many workers would enable us to keep the number of repetitions per worker much smaller.\nWe collected data from 384 workers and found that existing models on operational time and error rate showed good fits (both R^2 > 0.95).\nA simulation where we changed the number of participants N_P and the number of repetitions N_repeat showed that the time prediction model was robust against small N_P and N_repeat, although the error-rate model fitness was considerably degraded.\nThese findings empirically demonstrate a new utility of crowdsourced user experiments for collecting numerous participants, which should be of great use to HCI researchers for their evaluation studies.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"19 1","pages":"155-165"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v9i1.18948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The usage of crowdsourcing to recruit numerous participants has been recognized as beneficial in the human-computer interaction (HCI) field, such as for designing user interfaces and validating user performance models. In this work, we investigate its effectiveness for evaluating an error-rate prediction model in target pointing tasks. In contrast to models for operational times, a clicking error (i.e., missing a target) occurs by chance at a certain probability, e.g., 5%. Therefore, in traditional laboratory-based experiments, a lot of repetitions are needed to measure the central tendency of error rates. We hypothesize that recruiting many workers would enable us to keep the number of repetitions per worker much smaller. We collected data from 384 workers and found that existing models on operational time and error rate showed good fits (both R^2 > 0.95). A simulation where we changed the number of participants N_P and the number of repetitions N_repeat showed that the time prediction model was robust against small N_P and N_repeat, although the error-rate model fitness was considerably degraded. These findings empirically demonstrate a new utility of crowdsourced user experiments for collecting numerous participants, which should be of great use to HCI researchers for their evaluation studies.

查看原文本刊更多论文

利用众包用户实验测量用户表现的集中趋势来评估gui上的错误率模型

使用众包来招募大量参与者已被认为在人机交互(HCI)领域是有益的，例如设计用户界面和验证用户性能模型。在这项工作中，我们研究了它在评估目标指向任务中的错误率预测模型中的有效性。与操作时间的模型相反，点击错误(例如，错过目标)以一定的概率偶然发生，例如，5%。因此，在传统的基于实验室的实验中，需要大量的重复来测量错误率的集中趋势。我们假设，招募更多的工人将使我们能够保持每个工人的重复次数少得多。我们收集了384名工人的数据，发现现有的操作时间和错误率模型拟合良好(R^2 > 0.95)。通过改变参与者数量N_P和重复次数N_repeat的模拟，结果表明，尽管误差率模型的适应度大大降低，但时间预测模型对较小的N_P和N_repeat具有较强的鲁棒性。这些发现从经验上证明了收集大量参与者的众包用户实验的新效用，这应该对HCI研究人员的评估研究非常有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing

自引率

0.00%

发文量