基于n阶段潜在Dirichlet分配的土耳其语推文情感检测

Academic Platform Journal of Engineering and Science Pub Date : 2019-09-28 DOI:10.21541/apjes.459447

Zekeriya Anil Guven, B. Diri, Tolgahan Cakaloglu

{"title":"基于n阶段潜在Dirichlet分配的土耳其语推文情感检测","authors":"Zekeriya Anil Guven, B. Diri, Tolgahan Cakaloglu","doi":"10.21541/apjes.459447","DOIUrl":null,"url":null,"abstract":"Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. By using the multi-stages LDA, we were able to perform better (2-stages:70.5%, 3-stages:76.4%) than the state of the art result (60.4%) which was achieved using the plain LDA for 5 classes.","PeriodicalId":294830,"journal":{"name":"Academic Platform Journal of Engineering and Science","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets\",\"authors\":\"Zekeriya Anil Guven, B. Diri, Tolgahan Cakaloglu\",\"doi\":\"10.21541/apjes.459447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. By using the multi-stages LDA, we were able to perform better (2-stages:70.5%, 3-stages:76.4%) than the state of the art result (60.4%) which was achieved using the plain LDA for 5 classes.\",\"PeriodicalId\":294830,\"journal\":{\"name\":\"Academic Platform Journal of Engineering and Science\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Academic Platform Journal of Engineering and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21541/apjes.459447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Platform Journal of Engineering and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21541/apjes.459447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

了解社交媒体中情绪背后的原因，对于学习任何从未见过的书面文本的情绪特征起着关键作用。知道如何对情绪特征进行分类使得这项技术在许多领域都很有用。潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种主题建模算法，用于确定推特上的推文在研究中具有哪些情绪。该数据集由4000条推文组成，这些推文被分为5种不同的情绪，分别是愤怒、恐惧、快乐、悲伤和惊讶。使用Zemberek, Snowball和前5个字母根提取方法创建模型。利用所提出的n-stage LDA方法对生成的模型进行检验。在提出的方法中，我们的目标是通过减少字典中的单词数量来提高模型的成功率。通过使用多阶段LDA，我们能够比使用5个类别的普通LDA获得的最新结果(60.4%)表现更好(2阶段:70.5%，3阶段:76.4%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. By using the multi-stages LDA, we were able to perform better (2-stages:70.5%, 3-stages:76.4%) than the state of the art result (60.4%) which was achieved using the plain LDA for 5 classes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Academic Platform Journal of Engineering and Science

自引率

0.00%

发文量