使用LDA和BERT技术的主题建模:Teknofest示例

2021 6th International Conference on Computer Science and Engineering (UBMK) Pub Date : 2021-09-15 DOI:10.1109/UBMK52708.2021.9558988

Ercan Atagün, Bengisu Hartoka, A. Albayrak

{"title":"使用LDA和BERT技术的主题建模:Teknofest示例","authors":"Ercan Atagün, Bengisu Hartoka, A. Albayrak","doi":"10.1109/UBMK52708.2021.9558988","DOIUrl":null,"url":null,"abstract":"This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Topic Modeling Using LDA and BERT Techniques: Teknofest Example\",\"authors\":\"Ercan Atagün, Bengisu Hartoka, A. Albayrak\",\"doi\":\"10.1109/UBMK52708.2021.9558988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.\",\"PeriodicalId\":106516,\"journal\":{\"name\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"141 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK52708.2021.9558988\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文是一项自然语言处理研究，包括自然语言处理中使用的模型。主题建模是自然语言处理的一个分支领域。为了进行主题建模，使用近年来非常流行的数据抓取方法在社交媒体上获取数据集。该数据集与Teknofest竞赛有关。数据集是利用Selenium库创建的，Selenium库是用于数据抓取方法的流行库之一。为了能够对准备好的数据集进行分析，并保证聚类过程的一致性，对分析前要使用的文本进行预处理。文本预处理后，采用BERT、LDA等自然语言处理技术对数据集进行聚类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Topic Modeling Using LDA and BERT Techniques: Teknofest Example

This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 6th International Conference on Computer Science and Engineering (UBMK)

自引率

0.00%

发文量