Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms

Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu
{"title":"Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms","authors":"Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu","doi":"10.18100/IJAMEC.797572","DOIUrl":null,"url":null,"abstract":"Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.","PeriodicalId":120305,"journal":{"name":"International Journal of Applied Mathematics Electronics and Computers","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics Electronics and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18100/IJAMEC.797572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.
利用混合聚类算法发现用不同句子表达的相同招聘广告
近年来,对招聘广告进行文本挖掘研究,以确定每个职位所需的资格要求,这种研究已经变得非常普遍。可以说,对土耳其语的研究是有限的,而对英语语言的研究则遇到了很大的资源池。Kariyer。Net是土耳其最大的招聘广告公司,99%的广告都是土耳其语的。因此,有必要开发新的土耳其语自然语言处理(NLP)模型来分析这个庞大的数据库。在本研究中,卡里耶的招聘广告。并利用混合聚类算法,发现了该数据集作为大数据所隐藏的关联。首先,将所有HTML代码形式的广告通过提取HTML代码到内部文本的方式转化为规则的句子。然后,将这些包含核心广告的内部文本通过传统方法转换为子广告。在这些NLP步骤之后,使用混合聚类算法,可以设法检测到用不同句子表达的相同广告。该分析集中了信息技术(it)领域的57个职位和6897个广告文本。因此,可以声称获得的聚类包含有用的结果,并且所提出的模型可用于发现每个职位的常见和唯一广告。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信