Jude Lim, Vikram Mohanty, Terryl Dodson, Kurt Luther
{"title":"BackTrace: A Human-AI Collaborative Approach to Discovering Studio Backdrops in Historical Photographs","authors":"Jude Lim, Vikram Mohanty, Terryl Dodson, Kurt Luther","doi":"10.1609/hcomp.v11i1.27551","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27551","url":null,"abstract":"In historical photo research, the presence of painted backdrops have the potential to help identify subjects, photographers, locations, and events surrounding certain photographs. However, there are few dedicated tools or resources available to aid researchers in this largely manual task. In this paper, we propose BackTrace, a human-AI collaboration system that employs a three-step workflow to retrieve and organize historical photos with similar backdrops. BackTrace is a content-based image retrieval (CBIR) system powered by deep learning that allows for the iterative refinement of search results via user feedback. We evaluated BackTrace with mixed-methods evaluation and found that it successfully aided users in finding photos with similar backdrops and grouping them into collections. Finally, we discuss how our findings can be applied to other domains, as well as implications of deploying BackTrace as a crowdsourcing system.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"9 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Confidence Contours: Uncertainty-Aware Annotation for Medical Semantic Segmentation","authors":"Andre Ye, Quan Ze Chen, Amy Zhang","doi":"10.1609/hcomp.v11i1.27559","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27559","url":null,"abstract":"Medical image segmentation modeling is a high-stakes task where understanding of uncertainty is crucial for addressing visual ambiguity. Prior work has developed segmentation models utilizing probabilistic or generative mechanisms to infer uncertainty from labels where annotators draw a singular boundary. However, as these annotations cannot represent an individual annotator's uncertainty, models trained on them produce uncertainty maps that are difficult to interpret. We propose a novel segmentation representation, Confidence Contours, which uses high- and low-confidence ``contours’’ to capture uncertainty directly, and develop a novel annotation system for collecting contours. We conduct an evaluation on the Lung Image Dataset Consortium (LIDC) and a synthetic dataset. From an annotation study with 30 participants, results show that Confidence Contours provide high representative capacity without considerably higher annotator effort. We also find that general-purpose segmentation models can learn Confidence Contours at the same performance level as standard singular annotations. Finally, from interviews with 5 medical experts, we find that Confidence Contour maps are more interpretable than Bayesian maps due to representation of structural uncertainty.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"9 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Task-Interdependency Model of Complex Collaboration Towards Human-Centered Crowd Work (Extended Abstract)","authors":"David T. Lee, Christos A. Makridis","doi":"10.1609/hcomp.v11i1.27549","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27549","url":null,"abstract":"Mathematical models of crowdsourcing and human computation today largely assume small modular tasks, \"computational primitives\" such as labels, comparisons, or votes requiring little coordination. However, while these models have successfully shown how crowds can accomplish significant objectives, they can inadvertently advance a less than human view of crowd workers where workers are treated as low skilled, replaceable, and untrustworthy, carrying out simple tasks in online labor markets for low pay under algorithmic management. They also fail to capture the unique human capacity for complex collaborative work where the main concerns are how to effectively structure, delegate, and collaborate on work that may be large in scope, underdefined, and highly interdependent. We present a model centered on interdependencies—a phenomenon well understood to be at the core of collaboration—that allows one to formally reason about diverse challenges to complex collaboration. Our model represents tasks as an interdependent collection of subtasks, formalized as a task graph. Each node is a subtask with an arbitrary size parameter. Interdependencies, represented as node and edge weights, impose costs on workers who need to spend time absorbing context of relevant work. Importantly, workers do not have to pay this context cost for work they did themselves. To illustrate how this simple model can be used to reason about diverse aspects of complex collaboration, we apply the model to diverse aspects of complex collaboration. We examine the limits of scaling complex crowd work, showing how high interdependencies and low task granularity bound work capacity to a constant factor of the contributions of top workers, which is in turn limited when workers are short-term novices. We examine recruitment and upskilling, showing the outsized role top workers play in determining work capacity, and surfacing insights on situated learning through a stylized model of legimitate peripheral participation (LPP). Finally, we turn to the economy as a setting where complex collaborative work already exists, using our model to explore the relationship between coordination intensity and occupational wages. Using occupational data from O*NET and the Bureau of Labor Statistics, we introduce a new index of occupational coordination intensity and validate the predicted positive correlation. We find preliminary evidence that higher coordination intensity occupations are more resistant to displacement by AI based on historical growth in automation and OpenAI data on LLM exposure. Our hope is to spur further development of models that emphasize the collaborative capacities of human workers, bridge models of crowd work and traditional work, and promote AI in roles augmenting human collaboration. The full paper can be found at: https://doi.org/10.48550/arXiv.2309.00160.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"10 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruohan Zong, Yang Zhang, Frank Stinar, Lanyu Shang, Huimin Zeng, Nigel Bosch, Dong Wang
{"title":"A Crowd–AI Collaborative Approach to Address Demographic Bias for Student Performance Prediction in Online Education","authors":"Ruohan Zong, Yang Zhang, Frank Stinar, Lanyu Shang, Huimin Zeng, Nigel Bosch, Dong Wang","doi":"10.1609/hcomp.v11i1.27560","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27560","url":null,"abstract":"Recent advances in artificial intelligence (AI) and crowdsourcing have shown success in enhancing learning experiences and outcomes in online education. This paper studies a student performance prediction problem where the objective is to predict students' outcomes in online courses based on their behavioral data. In particular, we focus on addressing the limitation of current student performance prediction solutions that often make inaccurate predictions for students from underrepresented demographic groups due to the lack of training data and differences in behavioral patterns across groups. We develop DebiasEdu, a crowd–AI collaborative debias framework that melds the AI and crowd intelligence through 1) a novel gradient-based bias identification mechanism and 2) a bias-aware crowdsourcing interface and bias calibration design to achieve an accurate and fair student performance prediction. Evaluation results on two online courses demonstrate that DebiasEdu consistently outperforms state-of-the-art AI, fair AI, and crowd–AI baselines by achieving an optimized student performance prediction in terms of both accuracy and fairness.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"8 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tyler Malloy, Yinuo Du, Fei Fang, Cleotilde Gonzalez
{"title":"Accounting for Transfer of Learning Using Human Behavior Models","authors":"Tyler Malloy, Yinuo Du, Fei Fang, Cleotilde Gonzalez","doi":"10.1609/hcomp.v11i1.27553","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27553","url":null,"abstract":"An important characteristic of human learning and decision-making is the flexibility with which we rapidly adapt to novel tasks. To this day, models of human behavior have been unable to emulate the ease and success with which humans transfer knowledge in one context to another. Humans rely on a lifetime of experience and a variety of cognitive mechanisms that are difficult to represent computationally. To address this problem, we propose a novel human behavior model that accounts for human transfer of learning using three mechanisms: compositional reasoning, causal inference, and optimal forgetting. To evaluate this proposed model, we introduce an experiment task designed to elicit human transfer of learning under different conditions. Our proposed model demonstrates a more human-like transfer of learning compared to models that optimize transfer or human behavior models that do not directly account for transfer of learning. The results of the ablation testing of the proposed model and a systematic comparison to human data demonstrate the importance of each component of the cognitive model underlying the transfer of learning.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"10 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amy Rechkemmer, Alex C. Williams, Matthew Lease, Li Erran Li
{"title":"Characterizing Time Spent in Video Object Tracking Annotation Tasks: A Study of Task Complexity in Vehicle Tracking","authors":"Amy Rechkemmer, Alex C. Williams, Matthew Lease, Li Erran Li","doi":"10.1609/hcomp.v11i1.27555","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27555","url":null,"abstract":"Video object tracking annotation tasks are a form of complex data labeling that is inherently tedious and time-consuming. Prior studies of these tasks focus primarily on quality of the provided data, leaving much to be learned about how the data was generated and the factors that influenced how it was generated. In this paper, we take steps toward this goal by examining how human annotators spend their time in the context of a video object tracking annotation task. We situate our study in the context of a standard vehicle tracking task with bounding box annotation. Within this setting, we study the role of task complexity by controlling two dimensions of task design -- label constraint and label granularity -- in conjunction with worker experience. Using telemetry and survey data collected from 40 full-time data annotators at a large technology corporation, we find that each dimension of task complexity uniquely affects how annotators spend their time not only during the task, but also before it begins. Furthermore, we find significant misalignment in how time-use was observed and how time-use was self-reported. We conclude by discussing the implications of our findings in the context of video object tracking and the need to better understand how productivity can be defined in data annotation.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Humans Forgo Reward to Instill Fairness into AI","authors":"Lauren S. Treiman, Chien-Ju Ho, Wouter Kool","doi":"10.1609/hcomp.v11i1.27556","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27556","url":null,"abstract":"In recent years, artificial intelligence (AI) has become an integral part of our daily lives, assisting us with decision making. During such interactions, AI algorithms often use human behavior as training input. Therefore, it is important to understand whether people change their behavior when they train AI and if they continue to do so when training does not benefit them. In this work, we conduct behavioral experiments in the context of the ultimatum game to answer these questions. In our version of this game, participants were asked to decide whether to accept or reject proposals of monetary splits made by either other human participants or AI. Some participants were informed that their choices would be used to train AI, while others did not receive this information. In the first experiment, we found that participants were willing to sacrifice personal earnings to train AI to be fair as they became less inclined to accept unfair offers. The second experiment replicated and expanded upon this finding, revealing that participants were motivated to train AI even if they would never encounter it in the future. These findings demonstrate that humans are willing to incur costs to change AI algorithms. Moreover, they suggest that human behavior during AI training does not necessarily align with baseline preferences. This observation poses a challenge for AI development, revealing that it is important for AI algorithms to account for their influence on behavior when recommending choices.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"9 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Barker, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Umang Bhatt
{"title":"Selective Concept Models: Permitting Stakeholder Customisation at Test-Time","authors":"Matthew Barker, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Umang Bhatt","doi":"10.1609/hcomp.v11i1.27543","DOIUrl":"https://doi.org/10.1609/hcomp.v11i1.27543","url":null,"abstract":"Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We propose Selective COncept Models (SCOMs) which make predictions using only a subset of concepts and can be customised by stakeholders at test-time according to their preferences. We show that SCOMs only require a fraction of the total concepts to achieve optimal accuracy on multiple real-world datasets. Further, we collect and release a new dataset, CUB-Sel, consisting of human concept set selections for 900 bird images from the popular CUB dataset. Using CUB-Sel, we show that humans have unique individual preferences for the choice of concepts they prefer to reason about, and struggle to identify the most theoretically informative concepts. The customisation and concept selection provided by SCOM improves the efficiency of interpretation and intervention for stakeholders.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"10 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourcing Perceptions of Gerrymandering","authors":"Benjamin Kelly, Inwon Kang, Lirong Xia","doi":"10.1609/hcomp.v10i1.21993","DOIUrl":"https://doi.org/10.1609/hcomp.v10i1.21993","url":null,"abstract":"Gerrymandering is the manipulation of redistricting to influence the results of a set of elections for local representatives. Gerrymandering has the potential to drastically swing power in legislative bodies even with no change in a population’s political views. Identifying gerrymandering and measuring fairness using metrics of proposed district plans is a topic of current research, but there is less work on how such plans will be perceived by voters. Gathering data on such perceptions presents several challenges such as the ambiguous definitions of ‘fair’ and the complexity of real world geography and district plans. We present a dataset collected from an online crowdsourcing platform on a survey asking respondents to mark which of two maps of equal population distribution but different districts appear more ‘fair’ and the reasoning for their decision. We performed preliminary analysis on this data and identified which of several commonly suggested metrics are most predictive of the responses. We found that the maximum perimeter of any district was the most predictive metric, especially with participants who reported that they made their decision based on the shape of the districts.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"50 1","pages":"124-132"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86138199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Sun, Yuhan Liu, Grace Joseph, Zhou Yu, Haiyi Zhu, Steven W. Dow
{"title":"Comparing Experts and Novices for AI Data Work: Insights on Allocating Human Intelligence to Design a Conversational Agent","authors":"Lu Sun, Yuhan Liu, Grace Joseph, Zhou Yu, Haiyi Zhu, Steven W. Dow","doi":"10.1609/hcomp.v10i1.21999","DOIUrl":"https://doi.org/10.1609/hcomp.v10i1.21999","url":null,"abstract":"Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"34 1","pages":"195-206"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88160388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}