用众包中机器学习模型的拟合优度衡量员工素质

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI:10.1145/3472163.3472279

Yumiko Suzuki

{"title":"用众包中机器学习模型的拟合优度衡量员工素质","authors":"Yumiko Suzuki","doi":"10.1145/3472163.3472279","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method for predicting the quality of crowdsourcing workers using the goodness-of-fit (GoF) of machine learning models. We assume a relationship between the quality of workers and the quality of machine-learning models using the outcomes of the workers as training data. This assumption means that if worker quality is high, a machine-learning classifier constructed using the worker’s outcomes can easily predict the outcomes of the worker. If this assumption is confirmed, we can measure the worker quality without using the correct answer sets, and then the requesters can reduce the time and effort. However, if the outcomes by workers are low quality, the input tweet does not correspond to the outcomes. Therefore, if we construct a tweet classifier using input tweets and the classified results by the worker, the prediction of the outcomes by the classifier and that by the workers should differ. We assume that the GoF scores, such as accuracy and F1 scores of the test set using this classifier, correlates to worker quality. Therefore, we can predict worker quality using the GoF scores. In our experiment, we did the tweet classification task using crowdsourcing. We confirmed that the GoF scores and the quality of workers correlate. These results show that we can predict the quality of workers using the GoF scores.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measuring Quality of Workers by Goodness-of-Fit of Machine Learning Model in Crowdsourcing\",\"authors\":\"Yumiko Suzuki\",\"doi\":\"10.1145/3472163.3472279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a method for predicting the quality of crowdsourcing workers using the goodness-of-fit (GoF) of machine learning models. We assume a relationship between the quality of workers and the quality of machine-learning models using the outcomes of the workers as training data. This assumption means that if worker quality is high, a machine-learning classifier constructed using the worker’s outcomes can easily predict the outcomes of the worker. If this assumption is confirmed, we can measure the worker quality without using the correct answer sets, and then the requesters can reduce the time and effort. However, if the outcomes by workers are low quality, the input tweet does not correspond to the outcomes. Therefore, if we construct a tweet classifier using input tweets and the classified results by the worker, the prediction of the outcomes by the classifier and that by the workers should differ. We assume that the GoF scores, such as accuracy and F1 scores of the test set using this classifier, correlates to worker quality. Therefore, we can predict worker quality using the GoF scores. In our experiment, we did the tweet classification task using crowdsourcing. We confirmed that the GoF scores and the quality of workers correlate. These results show that we can predict the quality of workers using the GoF scores.\",\"PeriodicalId\":242683,\"journal\":{\"name\":\"Proceedings of the 25th International Database Engineering & Applications Symposium\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Database Engineering & Applications Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3472163.3472279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472163.3472279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了一种使用机器学习模型的拟合优度(GoF)来预测众包工人质量的方法。我们使用工人的结果作为训练数据，假设工人的质量和机器学习模型的质量之间存在关系。这个假设意味着，如果工人的素质很高，使用工人的结果构建的机器学习分类器可以很容易地预测工人的结果。如果这个假设得到证实，我们可以在不使用正确答案集的情况下衡量工作人员的质量，然后请求者可以减少时间和精力。但是，如果工作人员的结果是低质量的，则输入的tweet与结果不对应。因此，如果我们使用输入的推文和工作人员的分类结果构建推文分类器，分类器对结果的预测和工作人员的预测应该不同。我们假设GoF分数，例如使用此分类器的测试集的准确性和F1分数，与工人质量相关。因此，我们可以使用GoF分数来预测工人的素质。在我们的实验中，我们使用众包来完成tweet分类任务。我们证实了GoF分数和工人的素质是相关的。这些结果表明，我们可以使用GoF分数来预测工人的素质。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Measuring Quality of Workers by Goodness-of-Fit of Machine Learning Model in Crowdsourcing

In this paper, we propose a method for predicting the quality of crowdsourcing workers using the goodness-of-fit (GoF) of machine learning models. We assume a relationship between the quality of workers and the quality of machine-learning models using the outcomes of the workers as training data. This assumption means that if worker quality is high, a machine-learning classifier constructed using the worker’s outcomes can easily predict the outcomes of the worker. If this assumption is confirmed, we can measure the worker quality without using the correct answer sets, and then the requesters can reduce the time and effort. However, if the outcomes by workers are low quality, the input tweet does not correspond to the outcomes. Therefore, if we construct a tweet classifier using input tweets and the classified results by the worker, the prediction of the outcomes by the classifier and that by the workers should differ. We assume that the GoF scores, such as accuracy and F1 scores of the test set using this classifier, correlates to worker quality. Therefore, we can predict worker quality using the GoF scores. In our experiment, we did the tweet classification task using crowdsourcing. We confirmed that the GoF scores and the quality of workers correlate. These results show that we can predict the quality of workers using the GoF scores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th International Database Engineering & Applications Symposium

自引率

0.00%

发文量