利用RoBERTa方法检测印尼艺术家Instagram评论栏中的印尼仇恨言论

Adhe Akram Azhari, Yuliant Sibaroni, Sri Suryani Prasetiyowati
{"title":"利用RoBERTa方法检测印尼艺术家Instagram评论栏中的印尼仇恨言论","authors":"Adhe Akram Azhari, Yuliant Sibaroni, Sri Suryani Prasetiyowati","doi":"10.29100/jipi.v8i3.3898","DOIUrl":null,"url":null,"abstract":"This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.","PeriodicalId":32696,"journal":{"name":"JIPI Jurnal IPA dan Pembelajaran IPA","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Indonesian Hate Speech in the Comments Column of Indone-sian Artists' Instagram Using the RoBERTa Method\",\"authors\":\"Adhe Akram Azhari, Yuliant Sibaroni, Sri Suryani Prasetiyowati\",\"doi\":\"10.29100/jipi.v8i3.3898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.\",\"PeriodicalId\":32696,\"journal\":{\"name\":\"JIPI Jurnal IPA dan Pembelajaran IPA\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JIPI Jurnal IPA dan Pembelajaran IPA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29100/jipi.v8i3.3898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JIPI Jurnal IPA dan Pembelajaran IPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29100/jipi.v8i3.3898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究从Instagram帖子评论中检测仇恨言论评论,其中使用的方法是RoBERTa。选择Roberta的模型是考虑到该模型在英语文本分类方面比其他模型具有较高的准确率,并且可能在本研究中使用的印尼语检测方面具有良好的潜力。有全预处理和非全预处理两种测试场景,实验结果表明,非全预处理的准确率平均值高于全预处理,非全预处理的准确率平均值为85.09%。全预处理包括几个预处理阶段,即清理、案例折叠、规范化、标记化和词干提取。而非全预处理则包括预处理中除词干提取过程外的所有过程。这表明RoBERTa在不使用完全预处理的情况下可以很好地预测评论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detection of Indonesian Hate Speech in the Comments Column of Indone-sian Artists' Instagram Using the RoBERTa Method
This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
25
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信