Comparing Experts and Novices for AI Data Work: Insights on Allocating Human Intelligence to Design a Conversational Agent

Lu Sun, Yuhan Liu, Grace Joseph, Zhou Yu, Haiyi Zhu, Steven W. Dow
{"title":"Comparing Experts and Novices for AI Data Work: Insights on Allocating Human Intelligence to Design a Conversational Agent","authors":"Lu Sun, Yuhan Liu, Grace Joseph, Zhou Yu, Haiyi Zhu, Steven W. Dow","doi":"10.1609/hcomp.v10i1.21999","DOIUrl":null,"url":null,"abstract":"Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v10i1.21999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI.
比较专家和新手的人工智能数据工作:分配人类智能以设计会话代理的见解
许多人工智能系统设计师都在努力解决如何最好地收集不同类型训练数据的人类输入。在线人群提供了廉价的按需情报来源,但他们往往缺乏许多领域所需的专业知识。专家提供隐性知识和更细致入微的意见,但他们更难招到。为了探索这种权衡,我们在设计基于文本的会话代理的背景下,比较了新手和专家在人类智能任务上的表现和感知。我们开发了一个初步的聊天机器人,它可以模拟与寻求心理健康建议的人的对话,以帮助教育7cups.com的志愿听众。然后,我们招募了经验丰富的听众(领域专家)和MTurk新手(人群工作者)来执行任务,以提高不同复杂程度的聊天机器人。在只需要自然语言理解的任务上,新手群体的表现与专家不相上下,比如纠正系统对用户语句的分类方式。对于更具生成性的任务,比如创建新的聊天机器人对话,专家们表现出更高的质量、新新性和情感。我们还发现了一个动机差异:群体工作者喜欢互动任务,而专家则认为这项工作乏味且重复。我们提供了分配人工智能系统输入任务的人群工作者和专家的设计考虑,以及更好地激励专家参与人工智能的低级数据工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信