{"title":"基于潜在狄利克雷分配和隐马尔可夫模型POS-TAG(词性标注)的Twitter故事生成器","authors":"Yasir Abdur Rohman, R. Kusumaningrum","doi":"10.1109/ICICoS48119.2019.8982411","DOIUrl":null,"url":null,"abstract":"Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.","PeriodicalId":105407,"journal":{"name":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)\",\"authors\":\"Yasir Abdur Rohman, R. Kusumaningrum\",\"doi\":\"10.1109/ICICoS48119.2019.8982411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.\",\"PeriodicalId\":105407,\"journal\":{\"name\":\"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICoS48119.2019.8982411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS48119.2019.8982411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)
Twitter active users in Indonesia reached 50 million users from a total worldwide of 284 million in 2015. In January 2019, active users on Twitter increased by 52% compared to 2018 where active users were only 27%.A large number of users causes the number of tweet documents increases. Tweet documents that contain information such as user activity, news, story can be processed into valuable information for journalists. All of the information collected then arranged based on related tweets into a storytelling that will become news/article. The whole process is still done manually by collecting one by one for each tweet and most of the tweet documents are collected from the trending topic. Actually, that should be done automatically by collecting tweets that have the same topic. Therefore, this research proposes a method of Twitter storytelling generator that combines Latent Dirichlet Allocation (LDA) and Hidden Markov Model POS-TAG (Part-of-Speech Tagging), so it can generate twitter storytelling based on the certain topic. We implemented two scenarios of the experiment. The first experimental calculating the value of perplexity on LDA and HMM POS-TAG, yielding the lowest perplexity value of 6.31 with alpha 0.001, beta 0.001, and the number of topics 4. While the second experimental calculating the value of ROUGE-1, ROUGE-2, BLEU-1, and BLEU-2 on the results of Twitter storytelling generator, yielding the best ROUGE-1 value is 0.470 with the beta cap value of 0.1 and the best ROUGE-2 value is 0.149 with the beta cap value of 0.001. Meanwhile, the best BLEU-1 value is 0.617 on the topic 1 and the best BLEU-2 value is 0.432 on the topic 3. Twitter storytelling generator using the proposed method has good performance when HMM POS-TAG can tagging the tweet documents correctly.