利用混合聚类算法发现用不同句子表达的相同招聘广告

International Journal of Applied Mathematics Electronics and Computers Pub Date : 2020-09-30 DOI:10.18100/IJAMEC.797572

Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu

{"title":"利用混合聚类算法发现用不同句子表达的相同招聘广告","authors":"Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu","doi":"10.18100/IJAMEC.797572","DOIUrl":null,"url":null,"abstract":"Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.","PeriodicalId":120305,"journal":{"name":"International Journal of Applied Mathematics Electronics and Computers","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms\",\"authors\":\"Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu\",\"doi\":\"10.18100/IJAMEC.797572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.\",\"PeriodicalId\":120305,\"journal\":{\"name\":\"International Journal of Applied Mathematics Electronics and Computers\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Applied Mathematics Electronics and Computers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18100/IJAMEC.797572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics Electronics and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18100/IJAMEC.797572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，对招聘广告进行文本挖掘研究，以确定每个职位所需的资格要求，这种研究已经变得非常普遍。可以说，对土耳其语的研究是有限的，而对英语语言的研究则遇到了很大的资源池。Kariyer。Net是土耳其最大的招聘广告公司，99%的广告都是土耳其语的。因此，有必要开发新的土耳其语自然语言处理(NLP)模型来分析这个庞大的数据库。在本研究中，卡里耶的招聘广告。并利用混合聚类算法，发现了该数据集作为大数据所隐藏的关联。首先，将所有HTML代码形式的广告通过提取HTML代码到内部文本的方式转化为规则的句子。然后，将这些包含核心广告的内部文本通过传统方法转换为子广告。在这些NLP步骤之后，使用混合聚类算法，可以设法检测到用不同句子表达的相同广告。该分析集中了信息技术(it)领域的57个职位和6897个广告文本。因此，可以声称获得的聚类包含有用的结果，并且所提出的模型可用于发现每个职位的常见和唯一广告。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms

Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Applied Mathematics Electronics and Computers

自引率

0.00%

发文量