理解文本高亮在众包任务中的影响

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing Pub Date : 2019-09-06 DOI:10.1609/hcomp.v7i1.5268

Jorge Ramírez, Marcos Báez, F. Casati, B. Benatallah

{"title":"理解文本高亮在众包任务中的影响","authors":"Jorge Ramírez, Marcos Báez, F. Casati, B. Benatallah","doi":"10.1609/hcomp.v7i1.5268","DOIUrl":null,"url":null,"abstract":"Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combining human computation and machine learning, have been proposed to improve accuracy and reduce costs. One way to do so is to have ML highlight or emphasize portions of text that it believes to be more relevant to the decision. Humans can then rely only on this text or read the entire text if the highlighted information is insufficient. In this paper, we investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks. We study this through a series of crowdsourcing experiments running over different datasets and with task designs imposing different cognitive demands. Our findings suggest that highlighting is effective in reducing classification effort but does not improve accuracy - and in fact, low-quality highlighting can decrease it.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":"334 1","pages":"144-152"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Understanding the Impact of Text Highlighting in Crowdsourcing Tasks\",\"authors\":\"Jorge Ramírez, Marcos Báez, F. Casati, B. Benatallah\",\"doi\":\"10.1609/hcomp.v7i1.5268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combining human computation and machine learning, have been proposed to improve accuracy and reduce costs. One way to do so is to have ML highlight or emphasize portions of text that it believes to be more relevant to the decision. Humans can then rely only on this text or read the entire text if the highlighted information is insufficient. In this paper, we investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks. We study this through a series of crowdsourcing experiments running over different datasets and with task designs imposing different cognitive demands. Our findings suggest that highlighting is effective in reducing classification effort but does not improve accuracy - and in fact, low-quality highlighting can decrease it.\",\"PeriodicalId\":87339,\"journal\":{\"name\":\"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing\",\"volume\":\"334 1\",\"pages\":\"144-152\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/hcomp.v7i1.5268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v7i1.5268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

文本分类是机器学习(ML)项目最常见的目标之一，也是众包平台中最常见的人类智能任务之一。根据问题的性质，机器学习在这些任务中取得了不同程度的成功，而基于人群的分类已被证明是惊人的有效，但可能代价高昂。近年来，人们提出了将人工计算和机器学习相结合的混合文本分类算法，以提高准确率和降低成本。一种方法是让机器学习突出显示或强调它认为与决策更相关的文本部分。然后，如果高亮显示的信息不足，人们可以只依赖这篇文章，或者阅读整篇文章。在本文中，我们研究了在什么条件下突出文本的选定部分是否可以(或不能)提高分类成本和/或准确性，以及它通常如何影响人类智能任务的过程和结果。我们通过一系列在不同数据集上运行的众包实验以及施加不同认知需求的任务设计来研究这一点。我们的研究结果表明，高亮在减少分类工作量方面是有效的，但并不能提高准确率——事实上，低质量的高亮会降低准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Understanding the Impact of Text Highlighting in Crowdsourcing Tasks

Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combining human computation and machine learning, have been proposed to improve accuracy and reduce costs. One way to do so is to have ML highlight or emphasize portions of text that it believes to be more relevant to the decision. Humans can then rely only on this text or read the entire text if the highlighted information is insufficient. In this paper, we investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks. We study this through a series of crowdsourcing experiments running over different datasets and with task designs imposing different cognitive demands. Our findings suggest that highlighting is effective in reducing classification effort but does not improve accuracy - and in fact, low-quality highlighting can decrease it.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing

自引率

0.00%

发文量