{"title":"使用LDA和BERT技术的主题建模:Teknofest示例","authors":"Ercan Atagün, Bengisu Hartoka, A. Albayrak","doi":"10.1109/UBMK52708.2021.9558988","DOIUrl":null,"url":null,"abstract":"This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Topic Modeling Using LDA and BERT Techniques: Teknofest Example\",\"authors\":\"Ercan Atagün, Bengisu Hartoka, A. Albayrak\",\"doi\":\"10.1109/UBMK52708.2021.9558988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.\",\"PeriodicalId\":106516,\"journal\":{\"name\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"141 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK52708.2021.9558988\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Topic Modeling Using LDA and BERT Techniques: Teknofest Example
This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.