{"title":"使用机器学习的未声明工作预测:处理类不平衡和类重叠问题","authors":"Eleni Alogogianni, M. Virvou","doi":"10.1109/IISA56318.2022.9904366","DOIUrl":null,"url":null,"abstract":"Undeclared work is a complex and ever-changing problem severely impacting society and the economy. It is one of the structural parts of the informal sector and undermines the well-being of workers and businesses and the foundations of the welfare state. Labour inspectorates are among the leading public institutions dealing with undeclared work, but they face difficulties lacking human and financial resources and the appropriate tools. Yet, they own large volumes of data produced by the increasing use of e-Government services and ICT tools, which, if properly processed and analysed employing advanced machine learning techniques, are able to provide significant assistance in undeclared work prediction and understanding its features. Notably, classification algorithms may learn from datasets containing past labour inspection findings and produce classifiers that effectively predict labour law violations and provide understandable explanations for these predictions. Still, undeclared work is usually underrepresented in such datasets since it is not often detected in onsite inspections due to its hidden and multifaceted nature. In addition, several onsite inspection cases with similar characteristics may usually reveal different findings. These facts introduce the issues of class imbalance and class overlap in datasets of this application domain, which impede the machine learning process. The current research work focuses on data engineering techniques to address them. It uses data from real-life inspections and presents the effects of these techniques by creating several different classifiers and assessing their performance in predicting undeclared work, concluding with identifying the best approach.","PeriodicalId":217519,"journal":{"name":"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Undeclared Work Prediction Using Machine Learning: Dealing with the Class Imbalance and Class Overlap Problems\",\"authors\":\"Eleni Alogogianni, M. Virvou\",\"doi\":\"10.1109/IISA56318.2022.9904366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Undeclared work is a complex and ever-changing problem severely impacting society and the economy. It is one of the structural parts of the informal sector and undermines the well-being of workers and businesses and the foundations of the welfare state. Labour inspectorates are among the leading public institutions dealing with undeclared work, but they face difficulties lacking human and financial resources and the appropriate tools. Yet, they own large volumes of data produced by the increasing use of e-Government services and ICT tools, which, if properly processed and analysed employing advanced machine learning techniques, are able to provide significant assistance in undeclared work prediction and understanding its features. Notably, classification algorithms may learn from datasets containing past labour inspection findings and produce classifiers that effectively predict labour law violations and provide understandable explanations for these predictions. Still, undeclared work is usually underrepresented in such datasets since it is not often detected in onsite inspections due to its hidden and multifaceted nature. In addition, several onsite inspection cases with similar characteristics may usually reveal different findings. These facts introduce the issues of class imbalance and class overlap in datasets of this application domain, which impede the machine learning process. The current research work focuses on data engineering techniques to address them. It uses data from real-life inspections and presents the effects of these techniques by creating several different classifiers and assessing their performance in predicting undeclared work, concluding with identifying the best approach.\",\"PeriodicalId\":217519,\"journal\":{\"name\":\"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA56318.2022.9904366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA56318.2022.9904366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Undeclared Work Prediction Using Machine Learning: Dealing with the Class Imbalance and Class Overlap Problems
Undeclared work is a complex and ever-changing problem severely impacting society and the economy. It is one of the structural parts of the informal sector and undermines the well-being of workers and businesses and the foundations of the welfare state. Labour inspectorates are among the leading public institutions dealing with undeclared work, but they face difficulties lacking human and financial resources and the appropriate tools. Yet, they own large volumes of data produced by the increasing use of e-Government services and ICT tools, which, if properly processed and analysed employing advanced machine learning techniques, are able to provide significant assistance in undeclared work prediction and understanding its features. Notably, classification algorithms may learn from datasets containing past labour inspection findings and produce classifiers that effectively predict labour law violations and provide understandable explanations for these predictions. Still, undeclared work is usually underrepresented in such datasets since it is not often detected in onsite inspections due to its hidden and multifaceted nature. In addition, several onsite inspection cases with similar characteristics may usually reveal different findings. These facts introduce the issues of class imbalance and class overlap in datasets of this application domain, which impede the machine learning process. The current research work focuses on data engineering techniques to address them. It uses data from real-life inspections and presents the effects of these techniques by creating several different classifiers and assessing their performance in predicting undeclared work, concluding with identifying the best approach.