Comparing Experts and Novices for AI Data Work: Insights on Allocating Human Intelligence to Design a Conversational Agent

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing Pub Date : 2022-10-14 DOI:10.1609/hcomp.v10i1.21999

Lu Sun, Yuhan Liu, Grace Joseph, Zhou Yu, Haiyi Zhu, Steven W. Dow

{"title":"Comparing Experts and Novices for AI Data Work: Insights on Allocating Human Intelligence to Design a Conversational Agent","authors":"Lu Sun, Yuhan Liu, Grace Joseph, Zhou Yu, Haiyi Zhu, Steven W. Dow","doi":"10.1609/hcomp.v10i1.21999","DOIUrl":null,"url":null,"abstract":"Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v10i1.21999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI.

查看原文本刊更多论文

比较专家和新手的人工智能数据工作:分配人类智能以设计会话代理的见解

许多人工智能系统设计师都在努力解决如何最好地收集不同类型训练数据的人类输入。在线人群提供了廉价的按需情报来源，但他们往往缺乏许多领域所需的专业知识。专家提供隐性知识和更细致入微的意见，但他们更难招到。为了探索这种权衡，我们在设计基于文本的会话代理的背景下，比较了新手和专家在人类智能任务上的表现和感知。我们开发了一个初步的聊天机器人，它可以模拟与寻求心理健康建议的人的对话，以帮助教育7cups.com的志愿听众。然后，我们招募了经验丰富的听众(领域专家)和MTurk新手(人群工作者)来执行任务，以提高不同复杂程度的聊天机器人。在只需要自然语言理解的任务上，新手群体的表现与专家不相上下，比如纠正系统对用户语句的分类方式。对于更具生成性的任务，比如创建新的聊天机器人对话，专家们表现出更高的质量、新新性和情感。我们还发现了一个动机差异:群体工作者喜欢互动任务，而专家则认为这项工作乏味且重复。我们提供了分配人工智能系统输入任务的人群工作者和专家的设计考虑，以及更好地激励专家参与人工智能的低级数据工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing

自引率

0.00%

发文量