基于词汇的Twitter上印尼语情感分析的词汇添加效应

2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS) Pub Date : 2020-11-19 DOI:10.1109/ICIMCIS51567.2020.9354269

F. Saputra, S. Wijaya, Yani Nurhadryani, Defina

{"title":"基于词汇的Twitter上印尼语情感分析的词汇添加效应","authors":"F. Saputra, S. Wijaya, Yani Nurhadryani, Defina","doi":"10.1109/ICIMCIS51567.2020.9354269","DOIUrl":null,"url":null,"abstract":"Opinion numbers in social media such as Twitter are so widespread that it is not possible to read all of it sentiment (positive, negative, or neutral). Sentiment analysis is one method that can be used to overcome these problems. One of sentiment analysis approach is lexicon-based approach which is highly dependent on the completeness and diversity of sentiment lexicons. Therefore, this study conducts lexicon addition to the sentiment lexicon to improve performance. The datas used in this study were tweet data on the West Java 2018 Governor election, 2019 Presidential election, and COVID-19 pandemic. The results of classification are determined by the highest frequency of occurrence of words based on positive and negative sentiment lexicons. The result of lexicon addition thus being compared to previous work which is Lailiyah method and Saputra and Nurhadryani method. The lexicon addition has proven to improve the accuracy of both Lailiyah and Saputra and Nurhadryani methods on all data with an increase respectively: 6.09% and 5.07% on the 2018 West Java Governor election data, 9.16% and 5.9% on the 2019 Presidential election data, 15.74% and 15.48% on the COVID-19 pandemic data. The lexicon addition could improve the weighted f1-measure on both Lailiyah and Saputra and Nurhadryani methods on all data with an increase respectively: 4.85% and 2.09% on 2018 West Java Governor election, 6.89% and 2.26% on 2019 Presidential election, and 12.18% and 5.10% on COVID-19 pandemic.","PeriodicalId":441670,"journal":{"name":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Lexicon Addition Effect on Lexicon-Based of Indonesian Sentiment Analysis on Twitter\",\"authors\":\"F. Saputra, S. Wijaya, Yani Nurhadryani, Defina\",\"doi\":\"10.1109/ICIMCIS51567.2020.9354269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Opinion numbers in social media such as Twitter are so widespread that it is not possible to read all of it sentiment (positive, negative, or neutral). Sentiment analysis is one method that can be used to overcome these problems. One of sentiment analysis approach is lexicon-based approach which is highly dependent on the completeness and diversity of sentiment lexicons. Therefore, this study conducts lexicon addition to the sentiment lexicon to improve performance. The datas used in this study were tweet data on the West Java 2018 Governor election, 2019 Presidential election, and COVID-19 pandemic. The results of classification are determined by the highest frequency of occurrence of words based on positive and negative sentiment lexicons. The result of lexicon addition thus being compared to previous work which is Lailiyah method and Saputra and Nurhadryani method. The lexicon addition has proven to improve the accuracy of both Lailiyah and Saputra and Nurhadryani methods on all data with an increase respectively: 6.09% and 5.07% on the 2018 West Java Governor election data, 9.16% and 5.9% on the 2019 Presidential election data, 15.74% and 15.48% on the COVID-19 pandemic data. The lexicon addition could improve the weighted f1-measure on both Lailiyah and Saputra and Nurhadryani methods on all data with an increase respectively: 4.85% and 2.09% on 2018 West Java Governor election, 6.89% and 2.26% on 2019 Presidential election, and 12.18% and 5.10% on COVID-19 pandemic.\",\"PeriodicalId\":441670,\"journal\":{\"name\":\"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)\",\"volume\":\"123 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIMCIS51567.2020.9354269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMCIS51567.2020.9354269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

像推特这样的社交媒体上的意见数据是如此广泛，以至于不可能解读所有的情绪(积极的、消极的或中立的)。情感分析是一种可以用来克服这些问题的方法。情感分析方法之一是基于词汇的情感分析方法，该方法高度依赖于情感词汇的完整性和多样性。因此，本研究通过对情感词汇进行词汇添加来提高绩效。本研究使用的数据是西爪哇2018年省长选举、2019年总统选举和COVID-19大流行的推特数据。分类结果由基于积极和消极情绪词汇的词的最高出现频率决定。通过与Lailiyah法、Saputra和Nurhadryani法进行比较，得出了相应的结果。事实证明，增加词汇后，Lailiyah、Saputra和Nurhadryani方法在所有数据上的准确性都得到了提高，在2018年西爪哇省长选举数据上分别提高了6.09%和5.07%，在2019年总统选举数据上分别提高了9.16%和5.9%，在COVID-19大流行数据上分别提高了15.74%和15.48%。新增词汇对Lailiyah、Saputra和Nurhadryani方法的加权f1度量在所有数据上均有提高，分别对2018年西爪哇省长选举提高4.85%和2.09%，对2019年总统选举提高6.89%和2.26%，对COVID-19大流行提高12.18%和5.10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lexicon Addition Effect on Lexicon-Based of Indonesian Sentiment Analysis on Twitter

Opinion numbers in social media such as Twitter are so widespread that it is not possible to read all of it sentiment (positive, negative, or neutral). Sentiment analysis is one method that can be used to overcome these problems. One of sentiment analysis approach is lexicon-based approach which is highly dependent on the completeness and diversity of sentiment lexicons. Therefore, this study conducts lexicon addition to the sentiment lexicon to improve performance. The datas used in this study were tweet data on the West Java 2018 Governor election, 2019 Presidential election, and COVID-19 pandemic. The results of classification are determined by the highest frequency of occurrence of words based on positive and negative sentiment lexicons. The result of lexicon addition thus being compared to previous work which is Lailiyah method and Saputra and Nurhadryani method. The lexicon addition has proven to improve the accuracy of both Lailiyah and Saputra and Nurhadryani methods on all data with an increase respectively: 6.09% and 5.07% on the 2018 West Java Governor election data, 9.16% and 5.9% on the 2019 Presidential election data, 15.74% and 15.48% on the COVID-19 pandemic data. The lexicon addition could improve the weighted f1-measure on both Lailiyah and Saputra and Nurhadryani methods on all data with an increase respectively: 4.85% and 2.09% on 2018 West Java Governor election, 6.89% and 2.26% on 2019 Presidential election, and 12.18% and 5.10% on COVID-19 pandemic.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)

自引率

0.00%

发文量