Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu
{"title":"利用混合聚类算法发现用不同句子表达的相同招聘广告","authors":"Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu","doi":"10.18100/IJAMEC.797572","DOIUrl":null,"url":null,"abstract":"Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.","PeriodicalId":120305,"journal":{"name":"International Journal of Applied Mathematics Electronics and Computers","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms\",\"authors\":\"Y. Dogan, Feriştah Dalkılıç, R. A. Kut, K. C. Kara, Uygar Takazoğlu\",\"doi\":\"10.18100/IJAMEC.797572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.\",\"PeriodicalId\":120305,\"journal\":{\"name\":\"International Journal of Applied Mathematics Electronics and Computers\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Applied Mathematics Electronics and Computers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18100/IJAMEC.797572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics Electronics and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18100/IJAMEC.797572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms
Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.