{"title":"在简历排序和职位描述匹配中使用机器学习优化人才获取流程","authors":"Mohammed Alghazal","doi":"10.2118/204534-ms","DOIUrl":null,"url":null,"abstract":"\n Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching.\n Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks.\n After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes.\n This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.","PeriodicalId":11320,"journal":{"name":"Day 3 Tue, November 30, 2021","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions\",\"authors\":\"Mohammed Alghazal\",\"doi\":\"10.2118/204534-ms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching.\\n Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks.\\n After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes.\\n This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.\",\"PeriodicalId\":11320,\"journal\":{\"name\":\"Day 3 Tue, November 30, 2021\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 3 Tue, November 30, 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2118/204534-ms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Tue, November 30, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/204534-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions
Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching.
Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks.
After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes.
This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.