Investigating Author Research Relatedness through Crowdsourcing: A Replication Study on MTurk

IF 2 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
António Correia, Dennis Paulino, H. Paredes, D. Guimaraes, D. Schneider, Benjamim Fonseca
{"title":"Investigating Author Research Relatedness through Crowdsourcing: A Replication Study on MTurk","authors":"António Correia, Dennis Paulino, H. Paredes, D. Guimaraes, D. Schneider, Benjamim Fonseca","doi":"10.1109/CSCWD57460.2023.10152707","DOIUrl":null,"url":null,"abstract":"Determining the relatedness of publications by detecting similarities and connections between researchers and their outputs can help science stakeholders worldwide to find areas of common interest and potential collaboration. To this end, many studies have tried to explore authorship attribution and research similarity detection through the use of automatic approaches. Nonetheless, inferring author research relatedness from imperfect data containing errors and multiple references to the same entities is a long-standing challenge. In a previous study, we conducted an experiment where a homogeneous crowd of volunteers contributed to a set of author name disambiguation tasks. The results demonstrated an overall accuracy higher than 75% and we also found important effects tied to the confidence level indicated by participants in correct answers. However, this study left many open questions regarding the comparative accuracy of a large heterogeneous crowd with monetary rewards involved. This paper seeks to address some of these unanswered questions by repeating the experiment with a crowd of 140 online paid workers recruited via MTurk’s microtask crowdsourcing platform. Our replication study shows high accuracy for name disambiguation tasks based on authorship-level information and content features. These findings can be of greater informative value since they also explore hints of crowd behavior activity in terms of time duration and mean proportion of clicks per worker with implications for interface and interaction design.","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"26 3","pages":"77-82"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152707","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Determining the relatedness of publications by detecting similarities and connections between researchers and their outputs can help science stakeholders worldwide to find areas of common interest and potential collaboration. To this end, many studies have tried to explore authorship attribution and research similarity detection through the use of automatic approaches. Nonetheless, inferring author research relatedness from imperfect data containing errors and multiple references to the same entities is a long-standing challenge. In a previous study, we conducted an experiment where a homogeneous crowd of volunteers contributed to a set of author name disambiguation tasks. The results demonstrated an overall accuracy higher than 75% and we also found important effects tied to the confidence level indicated by participants in correct answers. However, this study left many open questions regarding the comparative accuracy of a large heterogeneous crowd with monetary rewards involved. This paper seeks to address some of these unanswered questions by repeating the experiment with a crowd of 140 online paid workers recruited via MTurk’s microtask crowdsourcing platform. Our replication study shows high accuracy for name disambiguation tasks based on authorship-level information and content features. These findings can be of greater informative value since they also explore hints of crowd behavior activity in terms of time duration and mean proportion of clicks per worker with implications for interface and interaction design.
通过众包调查作者研究相关性:MTurk的复制研究
通过检测研究人员及其产出之间的相似性和联系来确定出版物的相关性,可以帮助全世界的科学利益相关者找到共同感兴趣的领域和潜在的合作。为此,许多研究试图通过使用自动方法来探索作者归属和研究相似度检测。尽管如此,从包含错误和对同一实体的多次引用的不完整数据中推断作者研究的相关性是一个长期的挑战。在之前的一项研究中,我们进行了一项实验,让一群同质的志愿者参与一组作者姓名消歧任务。结果表明,总体准确率高于75%,我们还发现,参与者对正确答案的信心水平也有重要影响。然而,这项研究留下了许多悬而未决的问题,涉及到金钱奖励的大量异质人群的相对准确性。本文试图通过MTurk的微任务众包平台招募的140名在线付费员工来重复这个实验,以解决其中一些悬而未决的问题。我们的复制研究表明,基于作者级别信息和内容特征的名称消歧任务具有很高的准确性。这些发现可能具有更大的信息价值,因为它们还从持续时间和每个工作人员的平均点击比例方面探索了人群行为活动的线索,这对界面和交互设计有影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Supported Cooperative Work-The Journal of Collaborative Computing
Computer Supported Cooperative Work-The Journal of Collaborative Computing COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
6.40
自引率
4.20%
发文量
31
审稿时长
>12 weeks
期刊介绍: Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW. The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas. The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信