基于潜在狄利克雷分配和隐马尔可夫模型POS-TAG(词性标注)的Twitter故事生成器

2019 3rd International Conference on Informatics and Computational Sciences (ICICoS) Pub Date : 2019-10-01 DOI:10.1109/ICICoS48119.2019.8982411

Yasir Abdur Rohman, R. Kusumaningrum

{"title":"基于潜在狄利克雷分配和隐马尔可夫模型POS-TAG(词性标注)的Twitter故事生成器","authors":"Yasir Abdur Rohman, R. Kusumaningrum","doi":"10.1109/ICICoS48119.2019.8982411","DOIUrl":null,"url":null,"abstract":"Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.","PeriodicalId":105407,"journal":{"name":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)\",\"authors\":\"Yasir Abdur Rohman, R. Kusumaningrum\",\"doi\":\"10.1109/ICICoS48119.2019.8982411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.\",\"PeriodicalId\":105407,\"journal\":{\"name\":\"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICoS48119.2019.8982411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS48119.2019.8982411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

2015年，Twitter在印尼的活跃用户达到5000万，而全球用户总数为2.84亿。2019年1月，Twitter的活跃用户比2018年增长了52%，而2018年的活跃用户仅为27%。大量的用户导致tweet文档的数量增加。包含用户活动、新闻、故事等信息的Tweet文档可以被处理成对记者有价值的信息。所有收集到的信息，然后根据相关的推文排列成一个故事，将成为新闻/文章。整个过程仍然是手动完成的，每条推文都是一个接一个地收集，大部分推文文档都是从趋势主题中收集的。实际上，这应该通过收集具有相同主题的推文来自动完成。因此，本研究提出了一种结合Latent Dirichlet Allocation (LDA)和Hidden Markov Model POS-TAG (Part-of-Speech Tagging)的Twitter故事生成器方法，可以基于特定主题生成Twitter故事生成器。我们在实验中实现了两个场景。第一个实验计算LDA和HMM POS-TAG上的perplexity值，得到最小perplexity值为6.31,alpha值为0.001,beta值为0.001，题目数为4。第二个实验计算了ROUGE-1、ROUGE-2、blue -1和blue -2对Twitter故事生成器结果的值，得到最佳的ROUGE-1值为0.470,beta帽值为0.1，最佳的ROUGE-2值为0.149,beta帽值为0.001。同时，主题1的最佳BLEU-1值为0.617，主题3的最佳BLEU-2值为0.432。当HMM POS-TAG能够正确标记推文文档时，使用该方法的推文故事生成器具有良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)

Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)

自引率

0.00%

发文量