利用RoBERTa方法检测印尼艺术家Instagram评论栏中的印尼仇恨言论

JIPI Jurnal IPA dan Pembelajaran IPA Pub Date : 2023-08-30 DOI:10.29100/jipi.v8i3.3898

Adhe Akram Azhari, Yuliant Sibaroni, Sri Suryani Prasetiyowati

{"title":"利用RoBERTa方法检测印尼艺术家Instagram评论栏中的印尼仇恨言论","authors":"Adhe Akram Azhari, Yuliant Sibaroni, Sri Suryani Prasetiyowati","doi":"10.29100/jipi.v8i3.3898","DOIUrl":null,"url":null,"abstract":"This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.","PeriodicalId":32696,"journal":{"name":"JIPI Jurnal IPA dan Pembelajaran IPA","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Indonesian Hate Speech in the Comments Column of Indone-sian Artists' Instagram Using the RoBERTa Method\",\"authors\":\"Adhe Akram Azhari, Yuliant Sibaroni, Sri Suryani Prasetiyowati\",\"doi\":\"10.29100/jipi.v8i3.3898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.\",\"PeriodicalId\":32696,\"journal\":{\"name\":\"JIPI Jurnal IPA dan Pembelajaran IPA\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JIPI Jurnal IPA dan Pembelajaran IPA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29100/jipi.v8i3.3898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JIPI Jurnal IPA dan Pembelajaran IPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29100/jipi.v8i3.3898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究从Instagram帖子评论中检测仇恨言论评论，其中使用的方法是RoBERTa。选择Roberta的模型是考虑到该模型在英语文本分类方面比其他模型具有较高的准确率，并且可能在本研究中使用的印尼语检测方面具有良好的潜力。有全预处理和非全预处理两种测试场景，实验结果表明，非全预处理的准确率平均值高于全预处理，非全预处理的准确率平均值为85.09%。全预处理包括几个预处理阶段，即清理、案例折叠、规范化、标记化和词干提取。而非全预处理则包括预处理中除词干提取过程外的所有过程。这表明RoBERTa在不使用完全预处理的情况下可以很好地预测评论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection of Indonesian Hate Speech in the Comments Column of Indone-sian Artists' Instagram Using the RoBERTa Method

This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JIPI Jurnal IPA dan Pembelajaran IPA

自引率

0.00%

发文量

审稿时长

12 weeks