Saeed Ur Rehman Bhatti, M. Shaiq, Hira Sajid, Saad Ali Qureshi, Shujaat Hussain, Kifayat-Ullah Khan
{"title":"resummizer:混合型简历信息检索系统","authors":"Saeed Ur Rehman Bhatti, M. Shaiq, Hira Sajid, Saad Ali Qureshi, Shujaat Hussain, Kifayat-Ullah Khan","doi":"10.1109/imcom53663.2022.9721722","DOIUrl":null,"url":null,"abstract":"In the modern era, their exists a need for efficient Information Retrieval (IR), from large number of documents; consequently, leading towards automation. In the past decade, we have observed an exponential increase in data influx, particularly in the case of unstructured data, which includes images, videos, and textual documents. When textual data sources are taken into consideration, like in the case of resumes, there is no standard format, and hence, are liable to subjective experience. On the other hand, current automated information extraction techniques assume a standard format for documents. Previous researchers have employed Rule-based methods, supervised methods and semantics-based methods to extract entities from the resumes. Though these methods heavily depend on large amounts of data, that is usually in an unstructured format. Furthermore, these techniques are time-intensive and are prone to some limitations. Our study includes the selection of a two-step hybrid Information Retrieval methodology. Sequentially it can be broken down into text block classification which employs Boolean Naive Bayes with Laplcian smoothing and a tri-gram approach followed by Entity recognition using BERT-cased. Our approach had an Average F1 Score of 0.80 for Text Block Classification and an average F1 score of 0.52 for Named Entity Recognition.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Resu-mizer: Hybrid Resume Information Retrieval System\",\"authors\":\"Saeed Ur Rehman Bhatti, M. Shaiq, Hira Sajid, Saad Ali Qureshi, Shujaat Hussain, Kifayat-Ullah Khan\",\"doi\":\"10.1109/imcom53663.2022.9721722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the modern era, their exists a need for efficient Information Retrieval (IR), from large number of documents; consequently, leading towards automation. In the past decade, we have observed an exponential increase in data influx, particularly in the case of unstructured data, which includes images, videos, and textual documents. When textual data sources are taken into consideration, like in the case of resumes, there is no standard format, and hence, are liable to subjective experience. On the other hand, current automated information extraction techniques assume a standard format for documents. Previous researchers have employed Rule-based methods, supervised methods and semantics-based methods to extract entities from the resumes. Though these methods heavily depend on large amounts of data, that is usually in an unstructured format. Furthermore, these techniques are time-intensive and are prone to some limitations. Our study includes the selection of a two-step hybrid Information Retrieval methodology. Sequentially it can be broken down into text block classification which employs Boolean Naive Bayes with Laplcian smoothing and a tri-gram approach followed by Entity recognition using BERT-cased. Our approach had an Average F1 Score of 0.80 for Text Block Classification and an average F1 score of 0.52 for Named Entity Recognition.\",\"PeriodicalId\":367038,\"journal\":{\"name\":\"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/imcom53663.2022.9721722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/imcom53663.2022.9721722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Resu-mizer: Hybrid Resume Information Retrieval System
In the modern era, their exists a need for efficient Information Retrieval (IR), from large number of documents; consequently, leading towards automation. In the past decade, we have observed an exponential increase in data influx, particularly in the case of unstructured data, which includes images, videos, and textual documents. When textual data sources are taken into consideration, like in the case of resumes, there is no standard format, and hence, are liable to subjective experience. On the other hand, current automated information extraction techniques assume a standard format for documents. Previous researchers have employed Rule-based methods, supervised methods and semantics-based methods to extract entities from the resumes. Though these methods heavily depend on large amounts of data, that is usually in an unstructured format. Furthermore, these techniques are time-intensive and are prone to some limitations. Our study includes the selection of a two-step hybrid Information Retrieval methodology. Sequentially it can be broken down into text block classification which employs Boolean Naive Bayes with Laplcian smoothing and a tri-gram approach followed by Entity recognition using BERT-cased. Our approach had an Average F1 Score of 0.80 for Text Block Classification and an average F1 score of 0.52 for Named Entity Recognition.