基于N-Gram特征的朴素贝叶斯分类器对Twitter社交媒体上洪水情绪分析

Akbar Ridwan, H. Nuha, Ramanti Dharayani
{"title":"基于N-Gram特征的朴素贝叶斯分类器对Twitter社交媒体上洪水情绪分析","authors":"Akbar Ridwan, H. Nuha, Ramanti Dharayani","doi":"10.1109/ICoDSA55874.2022.9862827","DOIUrl":null,"url":null,"abstract":"Indonesia is 6th largest population affected by floods in the world, which is 640.000 people every year. Indonesia areas that often experience floods due to high-intensity rainfall and tropical climate. Recently, there was a flood in South Kalimantan on January 14, 2021. From this incident, few netizen expressed their opinions about the natural flood disaster through Twitter social media. In this study, the author will classify netizen views regarding the natural flood disaster so that the netizen is aware of the incident and they can prevent flood causes. We will divide the tweet into relevant and irrelevant categories to categorize the incident using the Naïve Bayes Classifier. This research implements N-gram features to consider the most efficient method for determining a classification. We use Naïve Bayes because it assumes all variables are unique and provides weight to the text data using N-Gram. The importance of text data could be used to create a Naïve Bayes Classification model to calculate the probability. The naïve Bayes method can be implemented in classifying natural flood disasters. The tweet within the result using bigram will give higher accuracy than unigram or trigram. According this study the goverment can have plan for future mitigation action.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Sentiment Analysis of Floods on Twitter Social Media Using the Naive Bayes Classifier Method with the N-Gram Feature\",\"authors\":\"Akbar Ridwan, H. Nuha, Ramanti Dharayani\",\"doi\":\"10.1109/ICoDSA55874.2022.9862827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Indonesia is 6th largest population affected by floods in the world, which is 640.000 people every year. Indonesia areas that often experience floods due to high-intensity rainfall and tropical climate. Recently, there was a flood in South Kalimantan on January 14, 2021. From this incident, few netizen expressed their opinions about the natural flood disaster through Twitter social media. In this study, the author will classify netizen views regarding the natural flood disaster so that the netizen is aware of the incident and they can prevent flood causes. We will divide the tweet into relevant and irrelevant categories to categorize the incident using the Naïve Bayes Classifier. This research implements N-gram features to consider the most efficient method for determining a classification. We use Naïve Bayes because it assumes all variables are unique and provides weight to the text data using N-Gram. The importance of text data could be used to create a Naïve Bayes Classification model to calculate the probability. The naïve Bayes method can be implemented in classifying natural flood disasters. The tweet within the result using bigram will give higher accuracy than unigram or trigram. According this study the goverment can have plan for future mitigation action.\",\"PeriodicalId\":339135,\"journal\":{\"name\":\"2022 International Conference on Data Science and Its Applications (ICoDSA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Data Science and Its Applications (ICoDSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICoDSA55874.2022.9862827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Data Science and Its Applications (ICoDSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDSA55874.2022.9862827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

印度尼西亚是世界上第六大受洪水影响的人口,每年有64万人。由于高强度降雨和热带气候,印度尼西亚地区经常遭受洪水。最近,2021年1月14日,南加里曼丹发生了洪水。从这次事件来看,很少有网友通过Twitter社交媒体表达他们对自然洪涝灾害的看法。在本研究中,作者将对网民对自然洪水灾害的看法进行分类,以便网民了解事件并预防洪水原因。我们将推文分为相关和不相关的类别,使用Naïve贝叶斯分类器对事件进行分类。本研究采用N-gram特征来考虑确定分类的最有效方法。我们使用Naïve贝叶斯,因为它假设所有变量都是唯一的,并使用N-Gram为文本数据提供权重。文本数据的重要性可以用来创建Naïve贝叶斯分类模型来计算概率。naïve贝叶斯方法可以实现对自然洪水灾害的分类。使用双字母组合的结果中的tweet将比单字母组合或三字母组合提供更高的准确性。根据这项研究,政府可以为未来的缓解行动制定计划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sentiment Analysis of Floods on Twitter Social Media Using the Naive Bayes Classifier Method with the N-Gram Feature
Indonesia is 6th largest population affected by floods in the world, which is 640.000 people every year. Indonesia areas that often experience floods due to high-intensity rainfall and tropical climate. Recently, there was a flood in South Kalimantan on January 14, 2021. From this incident, few netizen expressed their opinions about the natural flood disaster through Twitter social media. In this study, the author will classify netizen views regarding the natural flood disaster so that the netizen is aware of the incident and they can prevent flood causes. We will divide the tweet into relevant and irrelevant categories to categorize the incident using the Naïve Bayes Classifier. This research implements N-gram features to consider the most efficient method for determining a classification. We use Naïve Bayes because it assumes all variables are unique and provides weight to the text data using N-Gram. The importance of text data could be used to create a Naïve Bayes Classification model to calculate the probability. The naïve Bayes method can be implemented in classifying natural flood disasters. The tweet within the result using bigram will give higher accuracy than unigram or trigram. According this study the goverment can have plan for future mitigation action.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信