Jahna Otterbacher, Pinar Barlas, S. Kleanthous, K. Kyriakou
{"title":"How Do We Talk about Other People? Group (Un)Fairness in Natural Language Image Descriptions","authors":"Jahna Otterbacher, Pinar Barlas, S. Kleanthous, K. Kyriakou","doi":"10.1609/hcomp.v7i1.5267","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5267","url":null,"abstract":"Crowdsourcing plays a key role in developing algorithms for image recognition or captioning. Major datasets, such as MS COCO or Flickr30K, have been built by eliciting natural language descriptions of images from workers. Yet such elicitation tasks are susceptible to human biases, including stereotyping people depicted in images. Given the growing concerns surrounding discrimination in algorithms, as well as in the data used to train them, it is necessary to take a critical look at this practice. We conduct experiments at Figure Eight using a controlled set of people images. Men and women of various races are positioned in the same manner, wearing a grey t-shirt. We prompt workers for 10 descriptive labels, and consider them using the human-centric approach, which assumes reporting bias. We find that “what’s worth saying” about these uniform images often differs as a function of the gender and race of the depicted person, violating the notion of group fairness. Although this diversity in natural language people descriptions is expected and often beneficial, it could result in automated disparate impact if not managed properly.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"26 1","pages":"106-114"},"PeriodicalIF":0.0,"publicationDate":"2019-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87437016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junwon Park, Ranjay Krishna, Pranav Khadpe, Li Fei-Fei, Michael S. Bernstein
{"title":"AI-Based Request Augmentation to Increase Crowdsourcing Participation","authors":"Junwon Park, Ranjay Krishna, Pranav Khadpe, Li Fei-Fei, Michael S. Bernstein","doi":"10.1609/hcomp.v7i1.5282","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5282","url":null,"abstract":"To support the massive data requirements of modern supervised machine learning (ML) algorithms, crowdsourcing systems match volunteer contributors to appropriate tasks. Such systems learn what types of tasks contributors are interested to complete. In this paper, instead of focusing on what to ask, we focus on learning how to ask: how to make relevant and interesting requests to encourage crowdsourcing participation. We introduce a new technique that augments questions with ML-based request strategies drawn from social psychology. We also introduce a contextual bandit algorithm to select which strategy to apply for a given task and contributor. We deploy our approach to collect volunteer data from Instagram for the task of visual question answering (VQA), an important task in computer vision and natural language processing that has enabled numerous human-computer interaction applications. For example, when encountering a user’s Instagram post that contains the ornate Trevi Fountain in Rome, our approach learns to augment its original raw question “Where is this place?” with image-relevant compliments such as “What a great statue!” or with travel-relevant justifications such as “I would like to visit this place”, increasing the user’s likelihood of answering the question and thus providing a label. We deploy our agent on Instagram to ask questions about social media images, finding that the response rate improves from 15.8% with unaugmented questions to 30.54% with baseline rule-based strategies and to 58.1% with ML-based strategies.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"33 1","pages":"115-124"},"PeriodicalIF":0.0,"publicationDate":"2019-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90844186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Not Everyone Writes Good Examples but Good Examples Can Come from Anywhere","authors":"Shayan Doroudi, Ece Kamar, E. Brunskill","doi":"10.1609/hcomp.v7i1.5269","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5269","url":null,"abstract":"In many online environments, such as massive open online courses and crowdsourcing platforms, many people solve similar complex tasks. As a byproduct of solving these tasks, a pool of artifacts are created that may be able to help others perform better on similar tasks. In this paper, we explore whether work that is naturally done by crowdworkers can be used as examples to help future crowdworkers perform better on similar tasks. We explore this in the context of a product comparison review task, where workers must compare and contrast pairs of similar products. We first show that randomly presenting one or two peer-generated examples does not significantly improve performance on future tasks. In a second experiment, we show that presenting examples that are of sufficiently high quality leads to a statistically significant improvement in performance of future workers on a near transfer task. Moreover, our results suggest that even among high quality examples, there are differences in how effective the examples are, indicating that quality is not a perfect proxy for pedagogical value.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"85 1","pages":"12-21"},"PeriodicalIF":0.0,"publicationDate":"2019-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81078585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chris Madge, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun, Massimo Poesio
{"title":"Progression in a Language Annotation Game with a Purpose","authors":"Chris Madge, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun, Massimo Poesio","doi":"10.1609/hcomp.v7i1.5276","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5276","url":null,"abstract":"Within traditional games design, incorporating progressive difficulty is considered of fundamental importance. But despite the widespread intuition that progression could have clear benefits in Games-With-A-Purpose (GWAPs)–e.g., for training non-expert annotators to produce more complex judgements– progression is not in fact a prominent feature of GWAPs; and there is even less evidence on its effects. In this work we present an approach to progression in GWAPs that generalizes to different annotation tasks with minimal, if any, dependency on gold annotated data. Using this method we observe a statistically significant increase in accuracy over randomly showing items to annotators.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"12 1","pages":"77-85"},"PeriodicalIF":0.0,"publicationDate":"2019-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91523853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecast Aggregation via Peer Prediction","authors":"Juntao Wang, Yang Liu, Yiling Chen","doi":"10.1609/hcomp.v9i1.18946","DOIUrl":"https://doi.org/10.1609/hcomp.v9i1.18946","url":null,"abstract":"Crowdsourcing enables the solicitation of forecasts on a variety of prediction tasks from distributed groups of people. How to aggregate the solicited forecasts, which may vary in quality, into an accurate final prediction remains a challenging yet critical question. Studies have found that weighing expert forecasts more in aggregation can improve the accuracy of the aggregated prediction. However, this approach usually requires access to the historical performance data of the forecasters, which are often not available. In this paper, we study the problem of aggregating forecasts without having historical performance data. We propose using peer prediction methods, a family of mechanisms initially designed to truthfully elicit private information in the absence of ground truth verification, to assess the expertise of forecasters, and then using this assessment to improve forecast aggregation. We evaluate our peer-prediction-aided aggregators on a diverse collection of 14 human forecast datasets. Compared with a variety of existing aggregators, our aggregators achieve a significant and consistent improvement on aggregation accuracy measured by the Brier score and the log score. Our results reveal the effectiveness of identifying experts to improve aggregation even without historical data.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"26 1","pages":"131-142"},"PeriodicalIF":0.0,"publicationDate":"2019-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78326762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andi Peng, Besmira Nushi, Emre Kıcıman, K. Quinn, Siddharth Suri, Ece Kamar
{"title":"What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring","authors":"Andi Peng, Besmira Nushi, Emre Kıcıman, K. Quinn, Siddharth Suri, Ece Kamar","doi":"10.1609/hcomp.v7i1.5281","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5281","url":null,"abstract":"Although systematic biases in decision-making are widely documented, the ways in which they emerge from different sources is less understood. We present a controlled experimental platform to study gender bias in hiring by decoupling the effect of world distribution (the gender breakdown of candidates in a specific profession) from bias in human decision-making. We explore the effectiveness of representation criteria, fixed proportional display of candidates, as an intervention strategy for mitigation of gender bias by conducting experiments measuring human decision-makers’ rankings for who they would recommend as potential hires. Experiments across professions with varying gender proportions show that balancing gender representation in candidate slates can correct biases for some professions where the world distribution is skewed, although doing so has no impact on other professions where human persistent preferences are at play. We show that the gender of the decision-maker, complexity of the decision-making task and over- and under-representation of genders in the candidate slate can all impact the final decision. By decoupling sources of bias, we can better isolate strategies for bias mitigation in human-in-the-loop systems.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"12 1","pages":"125-134"},"PeriodicalIF":0.0,"publicationDate":"2019-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81321755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorge Ramírez, Marcos Báez, F. Casati, B. Benatallah
{"title":"Understanding the Impact of Text Highlighting in Crowdsourcing Tasks","authors":"Jorge Ramírez, Marcos Báez, F. Casati, B. Benatallah","doi":"10.1609/hcomp.v7i1.5268","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5268","url":null,"abstract":"Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combining human computation and machine learning, have been proposed to improve accuracy and reduce costs. One way to do so is to have ML highlight or emphasize portions of text that it believes to be more relevant to the decision. Humans can then rely only on this text or read the entire text if the highlighted information is insufficient. In this paper, we investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks. We study this through a series of crowdsourcing experiments running over different datasets and with task designs imposing different cognitive demands. Our findings suggest that highlighting is effective in reducing classification effort but does not improve accuracy - and in fact, low-quality highlighting can decrease it.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"334 1","pages":"144-152"},"PeriodicalIF":0.0,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79731584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arijit Ray, Yi Yao, Rakesh Kumar, Ajay Divakaran, Giedrius Burachas
{"title":"Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval","authors":"Arijit Ray, Yi Yao, Rakesh Kumar, Ajay Divakaran, Giedrius Burachas","doi":"10.1609/hcomp.v7i1.5275","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5275","url":null,"abstract":"While there have been many proposals on making AI algorithms explainable, few have attempted to evaluate the impact of AI-generated explanations on human performance in conducting human-AI collaborative tasks. To bridge the gap, we propose a Twenty-Questions style collaborative image retrieval game, Explanation-assisted Guess Which (ExAG), as a method of evaluating the efficacy of explanations (visual evidence or textual justification) in the context of Visual Question Answering (VQA). In our proposed ExAG, a human user needs to guess a secret image picked by the VQA agent by asking natural language questions to it. We show that overall, when AI explains its answers, users succeed more often in guessing the secret image correctly. Notably, a few correct explanations can readily improve human performance when VQA answers are mostly incorrect as compared to no-explanation games. Furthermore, we also show that while explanations rated as “helpful” significantly improve human performance, “incorrect” and “unhelpful” explanations can degrade performance as compared to no-explanation games. Our experiments, therefore, demonstrate that ExAG is an effective means to evaluate the efficacy of AI-generated explanation on a human-AI collaborative task.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"41 1","pages":"153-161"},"PeriodicalIF":0.0,"publicationDate":"2019-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85132437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourced PAC Learning under Classification Noise","authors":"Shelby Heinecke, L. Reyzin","doi":"10.1609/hcomp.v7i1.5279","DOIUrl":"https://doi.org/10.1609/hcomp.v7i1.5279","url":null,"abstract":"In this paper, we analyze PAC learnability from labels produced by crowdsourcing. In our setting, unlabeled examples are drawn from a distribution and labels are crowdsourced from workers who operate under classification noise, each with their own noise parameter. We develop an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier. Our three-step algorithm incorporates majority voting, pure-exploration bandits, and noisy-PAC learning. We prove several guarantees on the number of tasks labeled by workers for PAC learning in this setting and show that our algorithm improves upon the baseline by reducing the total number of tasks given to workers. We demonstrate the robustness of our algorithm by exploring its application to additional realistic crowdsourcing settings.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"19 1","pages":"41-49"},"PeriodicalIF":0.0,"publicationDate":"2019-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84667444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourcing New Tools to Start Lean and Succeed in Entrepreneurship","authors":"Priti Ambani","doi":"10.4018/978-1-5225-8362-2.ch021","DOIUrl":"https://doi.org/10.4018/978-1-5225-8362-2.ch021","url":null,"abstract":"The very essence of the new entrepreneur is shattering tradition. On the heels of the new social internet, we are seeing the rise of the solo-entrepreneur or intrapreneur who embraces globalisation, failure and successes and collaboration. Powerful networks brought together by crowdsourcing are supplying the tools to start lean, innovate and solve complex problems. This chapter by Priti Ambani explores the changing ecosystem and the effect of networked crowds on starting lean and succeeding with entrepreneurship.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81405029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}