EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce

Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442442.3453543

M. Bohlouli, Zhonghua He

{"title":"EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce","authors":"M. Bohlouli, Zhonghua He","doi":"10.1145/3442442.3453543","DOIUrl":null,"url":null,"abstract":"Nowadays, the volume and variety of generated data, how to process it and accordingly create value through scalable analytics are main challenges to industries and real-world practices such as talent analytics. For instance, large enterprises and job centres have to progress data intensive matching of job seekers to various job positions at the same time. In other words, it should result in the large scale assignment of best-fit (right) talents (Person) with right expertise (Profession) to the right job (Position) at the right time (Period). We call this definition as a 4P rule in this paper. All enterprises should consider 4P rule in their daily recruitment processes towards efficient workforce development strategies. Such consideration demands integrating large volumes of disparate data from various sources and strongly needs the use of scalable algorithms and analytics. The diversity of the data in human resource management requires speeding up analytical processes. The main challenge here is not only how and where to store the data, but also the analysing it towards creating value (knowledge discovery). In this paper, we propose a generic Career Knowledge Representation (CKR) model in order to be able to model most competences that exist in a wide variety of careers. A regenerated job qualification data of 15 million employees with 84 dimensions (competences) from real HRM data has been used in test and evaluation of proposed Evolutionary MapReduce K-Means method in this research. This proposed EMR method shows faster and more accurate experimental results in comparison to similar approaches and has been tested with real large scale datasets and achieved results are already discussed.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442442.3453543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays, the volume and variety of generated data, how to process it and accordingly create value through scalable analytics are main challenges to industries and real-world practices such as talent analytics. For instance, large enterprises and job centres have to progress data intensive matching of job seekers to various job positions at the same time. In other words, it should result in the large scale assignment of best-fit (right) talents (Person) with right expertise (Profession) to the right job (Position) at the right time (Period). We call this definition as a 4P rule in this paper. All enterprises should consider 4P rule in their daily recruitment processes towards efficient workforce development strategies. Such consideration demands integrating large volumes of disparate data from various sources and strongly needs the use of scalable algorithms and analytics. The diversity of the data in human resource management requires speeding up analytical processes. The main challenge here is not only how and where to store the data, but also the analysing it towards creating value (knowledge discovery). In this paper, we propose a generic Career Knowledge Representation (CKR) model in order to be able to model most competences that exist in a wide variety of careers. A regenerated job qualification data of 15 million employees with 84 dimensions (competences) from real HRM data has been used in test and evaluation of proposed Evolutionary MapReduce K-Means method in this research. This proposed EMR method shows faster and more accurate experimental results in comparison to similar approaches and has been tested with real large scale datasets and achieved results are already discussed.

查看原文本刊更多论文

EMR:使用进化MapReduce的大人力资源数据的可扩展聚类

如今，生成数据的数量和种类，如何处理这些数据，并通过可扩展的分析创造相应的价值，是行业和人才分析等现实世界实践的主要挑战。例如，大型企业和就业中心必须同时进行求职者与各个职位的数据密集匹配。换句话说，它应该导致在正确的时间(时期)将具有正确专业知识(专业)的最适合(正确)人才(人)大规模分配到正确的工作(位置)。在本文中，我们将此定义称为4P规则。所有企业都应该在日常招聘过程中考虑4P规则，以实现有效的劳动力发展战略。这种考虑需要集成来自不同来源的大量不同数据，并且强烈需要使用可扩展的算法和分析。人力资源管理数据的多样性要求加快分析过程。这里的主要挑战不仅是如何以及在哪里存储数据，还包括分析数据以创造价值(知识发现)。在本文中，我们提出了一个通用的职业知识表示(CKR)模型，以便能够对存在于各种职业中的大多数能力进行建模。本研究利用真实人力资源管理数据中包含84个维度(胜任力)的1500万名员工的职位资格数据，对所提出的Evolutionary MapReduce K-Means方法进行了测试和评估。与同类方法相比，所提出的EMR方法的实验结果更快、更准确，并已在实际的大规模数据集上进行了测试，取得的结果已经讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Proceedings of the Web Conference 2021

自引率

0.00%

发文量