使用机器学习的未声明工作预测:处理类不平衡和类重叠问题

2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA) Pub Date : 2022-07-18 DOI:10.1109/IISA56318.2022.9904366

Eleni Alogogianni, M. Virvou

{"title":"使用机器学习的未声明工作预测:处理类不平衡和类重叠问题","authors":"Eleni Alogogianni, M. Virvou","doi":"10.1109/IISA56318.2022.9904366","DOIUrl":null,"url":null,"abstract":"Undeclared work is a complex and ever-changing problem severely impacting society and the economy. It is one of the structural parts of the informal sector and undermines the well-being of workers and businesses and the foundations of the welfare state. Labour inspectorates are among the leading public institutions dealing with undeclared work, but they face difficulties lacking human and financial resources and the appropriate tools. Yet, they own large volumes of data produced by the increasing use of e-Government services and ICT tools, which, if properly processed and analysed employing advanced machine learning techniques, are able to provide significant assistance in undeclared work prediction and understanding its features. Notably, classification algorithms may learn from datasets containing past labour inspection findings and produce classifiers that effectively predict labour law violations and provide understandable explanations for these predictions. Still, undeclared work is usually underrepresented in such datasets since it is not often detected in onsite inspections due to its hidden and multifaceted nature. In addition, several onsite inspection cases with similar characteristics may usually reveal different findings. These facts introduce the issues of class imbalance and class overlap in datasets of this application domain, which impede the machine learning process. The current research work focuses on data engineering techniques to address them. It uses data from real-life inspections and presents the effects of these techniques by creating several different classifiers and assessing their performance in predicting undeclared work, concluding with identifying the best approach.","PeriodicalId":217519,"journal":{"name":"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Undeclared Work Prediction Using Machine Learning: Dealing with the Class Imbalance and Class Overlap Problems\",\"authors\":\"Eleni Alogogianni, M. Virvou\",\"doi\":\"10.1109/IISA56318.2022.9904366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Undeclared work is a complex and ever-changing problem severely impacting society and the economy. It is one of the structural parts of the informal sector and undermines the well-being of workers and businesses and the foundations of the welfare state. Labour inspectorates are among the leading public institutions dealing with undeclared work, but they face difficulties lacking human and financial resources and the appropriate tools. Yet, they own large volumes of data produced by the increasing use of e-Government services and ICT tools, which, if properly processed and analysed employing advanced machine learning techniques, are able to provide significant assistance in undeclared work prediction and understanding its features. Notably, classification algorithms may learn from datasets containing past labour inspection findings and produce classifiers that effectively predict labour law violations and provide understandable explanations for these predictions. Still, undeclared work is usually underrepresented in such datasets since it is not often detected in onsite inspections due to its hidden and multifaceted nature. In addition, several onsite inspection cases with similar characteristics may usually reveal different findings. These facts introduce the issues of class imbalance and class overlap in datasets of this application domain, which impede the machine learning process. The current research work focuses on data engineering techniques to address them. It uses data from real-life inspections and presents the effects of these techniques by creating several different classifiers and assessing their performance in predicting undeclared work, concluding with identifying the best approach.\",\"PeriodicalId\":217519,\"journal\":{\"name\":\"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA56318.2022.9904366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA56318.2022.9904366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

未申报工作是一个复杂多变的问题，严重影响着社会和经济。它是非正规部门的结构性组成部分之一，破坏了工人和企业的福祉，也破坏了福利国家的基础。劳工监察员是处理未申报工作的主要公共机构之一，但他们面临缺乏人力和财政资源以及适当工具的困难。然而，由于越来越多地使用电子政府服务和信息通信技术工具，他们拥有大量的数据，如果使用先进的机器学习技术对这些数据进行适当的处理和分析，就能够为未申报的工作预测和理解其特征提供重要的帮助。值得注意的是，分类算法可以从包含过去劳动检查结果的数据集中学习，并产生有效预测劳动法违规行为的分类器，并为这些预测提供可理解的解释。尽管如此，未申报的工作在这些数据集中通常代表性不足，因为由于其隐蔽性和多面性，在现场检查中通常不会发现。此外，几个具有相似特征的现场检查案例通常会发现不同的结果。这些事实在这个应用领域的数据集中引入了类不平衡和类重叠的问题，这阻碍了机器学习的过程。目前的研究工作主要集中在数据工程技术来解决这些问题。它使用来自实际检查的数据，并通过创建几种不同的分类器并评估它们在预测未申报工作方面的表现来展示这些技术的效果，最后确定最佳方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Undeclared Work Prediction Using Machine Learning: Dealing with the Class Imbalance and Class Overlap Problems

Undeclared work is a complex and ever-changing problem severely impacting society and the economy. It is one of the structural parts of the informal sector and undermines the well-being of workers and businesses and the foundations of the welfare state. Labour inspectorates are among the leading public institutions dealing with undeclared work, but they face difficulties lacking human and financial resources and the appropriate tools. Yet, they own large volumes of data produced by the increasing use of e-Government services and ICT tools, which, if properly processed and analysed employing advanced machine learning techniques, are able to provide significant assistance in undeclared work prediction and understanding its features. Notably, classification algorithms may learn from datasets containing past labour inspection findings and produce classifiers that effectively predict labour law violations and provide understandable explanations for these predictions. Still, undeclared work is usually underrepresented in such datasets since it is not often detected in onsite inspections due to its hidden and multifaceted nature. In addition, several onsite inspection cases with similar characteristics may usually reveal different findings. These facts introduce the issues of class imbalance and class overlap in datasets of this application domain, which impede the machine learning process. The current research work focuses on data engineering techniques to address them. It uses data from real-life inspections and presents the effects of these techniques by creating several different classifiers and assessing their performance in predicting undeclared work, concluding with identifying the best approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)

自引率

0.00%

发文量