在简历排序和职位描述匹配中使用机器学习优化人才获取流程

Day 3 Tue, November 30, 2021 Pub Date : 2021-12-15 DOI:10.2118/204534-ms

Mohammed Alghazal

{"title":"在简历排序和职位描述匹配中使用机器学习优化人才获取流程","authors":"Mohammed Alghazal","doi":"10.2118/204534-ms","DOIUrl":null,"url":null,"abstract":"\n Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching.\n Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks.\n After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes.\n This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.","PeriodicalId":11320,"journal":{"name":"Day 3 Tue, November 30, 2021","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions\",\"authors\":\"Mohammed Alghazal\",\"doi\":\"10.2118/204534-ms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching.\\n Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks.\\n After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes.\\n This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.\",\"PeriodicalId\":11320,\"journal\":{\"name\":\"Day 3 Tue, November 30, 2021\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 3 Tue, November 30, 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2118/204534-ms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Tue, November 30, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/204534-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

雇主通常使用耗时的筛选工具或由手动角色和预定义关键字驱动的在线匹配引擎来搜索潜在的求职者。这些传统技术已经跟不上机器学习和大数据分析的新数字革命。本文提出了一种先进的人工智能解决方案，用于简历排名和简历-职位描述匹配。本文使用开源简历和职位描述文档来构建和验证机器学习模型。文档被转换成图像，并通过谷歌云使用光学字符识别算法(OCR)从所有简历和职位描述文档中提取文本信息，准确率超过97%。在建模之前，提取的文本通过一系列自然语言处理(NLP)技术进行处理，包括拆分/标记常用词，将单词的屈折形式组合在一起，即词形化，以及去除停止词和标点符号。文本处理后，使用无监督机器学习算法潜狄利克雷分配(Latent Dirichlet Allocation, LDA)对简历进行训练，用于主题建模和分类。根据所使用的简历类型，该算法能够将它们分为4个主要的工作领域:市场营销和商业、工程、计算机科学/IT和健康。给每份简历分配分数，以表示排名的最大LDA概率。另一种更先进的深度学习算法Doc2Vec也被用来训练和匹配潜在的简历和相关的职位描述。在这个模型中，简历由唯一的向量表示，可以用来对相似的文档进行分组，匹配和检索与HR提供的给定职位描述文档相关的简历。测量每份简历与给定职位描述文件之间的相似性，以查询最佳职位候选人。该模型在与工程、IT和人力资源相关的几个职位描述文件中进行了测试，并能够从数百份经过培训的简历中识别出排名靠前的简历。本文提出了一种利用最新的第四次工业分辨率技术支持的先进计算模型对简历进行处理、分类和排名的创新方法。这种解决方案对求职者和雇主都是有益的，它提供了一种高效、公正的数据驱动方法，可以为特定的工作找到最优秀的求职者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions

Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching. Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks. After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes. This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Day 3 Tue, November 30, 2021

自引率

0.00%

发文量