用项目反应理论评估半监督学习数据集的适宜性

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) Pub Date : 2021-09-01 DOI:10.1109/SEAA53835.2021.00049

Teodor Fredriksson, D. I. Mattos, J. Bosch, H. H. Olsson

{"title":"用项目反应理论评估半监督学习数据集的适宜性","authors":"Teodor Fredriksson, D. I. Mattos, J. Bosch, H. H. Olsson","doi":"10.1109/SEAA53835.2021.00049","DOIUrl":null,"url":null,"abstract":"In practice, supervised learning algorithms require fully labeled datasets to achieve the high accuracy demanded by current modern applications. However, in industrial settings supervised learning algorithms can perform poorly because of few labeled instances. Semi-supervised learning (SSL) is an automatic labeling approach that utilizes complete labels to infer missing labels in partially complete datasets. The high number of available SSL algorithms and the lack of systematic comparison between them leaves practitioners without guidelines to select the appropriate one for their application. Moreover, each SSL algorithm is often validated and evaluated in a small number of common datasets. However, there is no research that examines what datasets are suitable for comparing different SSL algorihtms. The purpose of this paper is to empirically evaluate the suitability of the datasets commonly used to evaluate and compare different SSL algorithms. We performed a simulation study using twelve datasets of three different datatypes (numerical, text, image) on thirteen different SSL algorithms. The contributions of this paper are two-fold. First, we propose the use of Bayesian congeneric item response theory model to assess the suitability of commonly used datasets. Second, we compare the different SSL algorithms using these datasets. The results show that with except of three datasets, the others have very low discrimination factors and are easily solved by the current algorithms. Additionally, the SSL algorithms have overlapping 90% credible intervals, indicating uncertainty in the difference between the accuracy of these SSL models. The paper concludes suggesting that researchers and practitioners should better consider the choice of datasets used for comparing SSL algorithms.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Assessing the Suitability of Semi-Supervised Learning Datasets using Item Response Theory\",\"authors\":\"Teodor Fredriksson, D. I. Mattos, J. Bosch, H. H. Olsson\",\"doi\":\"10.1109/SEAA53835.2021.00049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In practice, supervised learning algorithms require fully labeled datasets to achieve the high accuracy demanded by current modern applications. However, in industrial settings supervised learning algorithms can perform poorly because of few labeled instances. Semi-supervised learning (SSL) is an automatic labeling approach that utilizes complete labels to infer missing labels in partially complete datasets. The high number of available SSL algorithms and the lack of systematic comparison between them leaves practitioners without guidelines to select the appropriate one for their application. Moreover, each SSL algorithm is often validated and evaluated in a small number of common datasets. However, there is no research that examines what datasets are suitable for comparing different SSL algorihtms. The purpose of this paper is to empirically evaluate the suitability of the datasets commonly used to evaluate and compare different SSL algorithms. We performed a simulation study using twelve datasets of three different datatypes (numerical, text, image) on thirteen different SSL algorithms. The contributions of this paper are two-fold. First, we propose the use of Bayesian congeneric item response theory model to assess the suitability of commonly used datasets. Second, we compare the different SSL algorithms using these datasets. The results show that with except of three datasets, the others have very low discrimination factors and are easily solved by the current algorithms. Additionally, the SSL algorithms have overlapping 90% credible intervals, indicating uncertainty in the difference between the accuracy of these SSL models. The paper concludes suggesting that researchers and practitioners should better consider the choice of datasets used for comparing SSL algorithms.\",\"PeriodicalId\":435977,\"journal\":{\"name\":\"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SEAA53835.2021.00049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在实践中，监督学习算法需要完全标记的数据集才能达到当前现代应用所要求的高精度。然而，在工业环境中，监督学习算法由于很少有标记实例而表现不佳。半监督学习(SSL)是一种自动标记方法，它利用完整标签来推断部分完整数据集中缺失的标签。可用的SSL算法数量众多，而且缺乏对它们之间的系统比较，这使得从业者在为其应用程序选择合适的算法时缺乏指导方针。此外，每个SSL算法通常在少量公共数据集中进行验证和评估。然而，没有研究检查哪些数据集适合比较不同的SSL算法。本文的目的是根据经验评估通常用于评估和比较不同SSL算法的数据集的适用性。我们在13种不同的SSL算法上使用3种不同数据类型(数字、文本、图像)的12个数据集进行了模拟研究。本文的贡献是双重的。首先，我们提出使用贝叶斯同属项目反应理论模型来评估常用数据集的适用性。其次，我们使用这些数据集比较不同的SSL算法。结果表明，除3个数据集外，其余数据集的判别因子都很低，用现有算法很容易求解。此外，SSL算法具有重叠的90%可信区间，这表明这些SSL模型的准确性差异存在不确定性。本文的结论建议，研究人员和从业人员应该更好地考虑用于比较SSL算法的数据集的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing the Suitability of Semi-Supervised Learning Datasets using Item Response Theory

In practice, supervised learning algorithms require fully labeled datasets to achieve the high accuracy demanded by current modern applications. However, in industrial settings supervised learning algorithms can perform poorly because of few labeled instances. Semi-supervised learning (SSL) is an automatic labeling approach that utilizes complete labels to infer missing labels in partially complete datasets. The high number of available SSL algorithms and the lack of systematic comparison between them leaves practitioners without guidelines to select the appropriate one for their application. Moreover, each SSL algorithm is often validated and evaluated in a small number of common datasets. However, there is no research that examines what datasets are suitable for comparing different SSL algorihtms. The purpose of this paper is to empirically evaluate the suitability of the datasets commonly used to evaluate and compare different SSL algorithms. We performed a simulation study using twelve datasets of three different datatypes (numerical, text, image) on thirteen different SSL algorithms. The contributions of this paper are two-fold. First, we propose the use of Bayesian congeneric item response theory model to assess the suitability of commonly used datasets. Second, we compare the different SSL algorithms using these datasets. The results show that with except of three datasets, the others have very low discrimination factors and are easily solved by the current algorithms. Additionally, the SSL algorithms have overlapping 90% credible intervals, indicating uncertainty in the difference between the accuracy of these SSL models. The paper concludes suggesting that researchers and practitioners should better consider the choice of datasets used for comparing SSL algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

自引率

0.00%

发文量