Damayanti Elisabeth, I. Budi, Muhammad Okky Ibrohim
{"title":"使用机器学习方法检测印度尼西亚推文中的仇恨代码:一个数据集和初步研究","authors":"Damayanti Elisabeth, I. Budi, Muhammad Okky Ibrohim","doi":"10.1109/ICoICT49345.2020.9166251","DOIUrl":null,"url":null,"abstract":"The existence of social media causes side effects from freedom of speech to freedom to hate. People can spread hate speech with creative ways to avoid the hate speech detector. Implicit intends used using many codes. The purpose of using these codes is to disguise their hate speech targets. This paper presents an implementation of hate code detection for Indonesian tweets using machine learning and a classification explainer. First, we developed a dataset for hate codes ground truth. We generated hate codes from two scenarios i.e., hate code from hate speech classification and hate code from hate code classification. We used Logistic Regression (LR), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) as our classifier. We also used TF-IDF and word bigrams as the features. The codes consist of word and phrase form. The best f-measure score is 94.90% from hate code classification using Logistic Regression with abusive codes elimination. This number means the model can detect all tweets that have no hate codes. For tweets that annotated have hate code, the f-measure is 28.23% for recognized all the hate codes, and the recall is 56.91%.","PeriodicalId":113108,"journal":{"name":"2020 8th International Conference on Information and Communication Technology (ICoICT)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Hate Code Detection in Indonesian Tweets using Machine Learning Approach: A Dataset and Preliminary Study\",\"authors\":\"Damayanti Elisabeth, I. Budi, Muhammad Okky Ibrohim\",\"doi\":\"10.1109/ICoICT49345.2020.9166251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The existence of social media causes side effects from freedom of speech to freedom to hate. People can spread hate speech with creative ways to avoid the hate speech detector. Implicit intends used using many codes. The purpose of using these codes is to disguise their hate speech targets. This paper presents an implementation of hate code detection for Indonesian tweets using machine learning and a classification explainer. First, we developed a dataset for hate codes ground truth. We generated hate codes from two scenarios i.e., hate code from hate speech classification and hate code from hate code classification. We used Logistic Regression (LR), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) as our classifier. We also used TF-IDF and word bigrams as the features. The codes consist of word and phrase form. The best f-measure score is 94.90% from hate code classification using Logistic Regression with abusive codes elimination. This number means the model can detect all tweets that have no hate codes. For tweets that annotated have hate code, the f-measure is 28.23% for recognized all the hate codes, and the recall is 56.91%.\",\"PeriodicalId\":113108,\"journal\":{\"name\":\"2020 8th International Conference on Information and Communication Technology (ICoICT)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 8th International Conference on Information and Communication Technology (ICoICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICoICT49345.2020.9166251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT49345.2020.9166251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hate Code Detection in Indonesian Tweets using Machine Learning Approach: A Dataset and Preliminary Study
The existence of social media causes side effects from freedom of speech to freedom to hate. People can spread hate speech with creative ways to avoid the hate speech detector. Implicit intends used using many codes. The purpose of using these codes is to disguise their hate speech targets. This paper presents an implementation of hate code detection for Indonesian tweets using machine learning and a classification explainer. First, we developed a dataset for hate codes ground truth. We generated hate codes from two scenarios i.e., hate code from hate speech classification and hate code from hate code classification. We used Logistic Regression (LR), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) as our classifier. We also used TF-IDF and word bigrams as the features. The codes consist of word and phrase form. The best f-measure score is 94.90% from hate code classification using Logistic Regression with abusive codes elimination. This number means the model can detect all tweets that have no hate codes. For tweets that annotated have hate code, the f-measure is 28.23% for recognized all the hate codes, and the recall is 56.91%.