{"title":"基于Doc2Vec和卷积神经网络的印尼推特网络欺凌检测","authors":"Shindy Trimaria Laxmi, Rita Rismala, Hani Nurrahmi","doi":"10.1109/ICoICT52021.2021.9527420","DOIUrl":null,"url":null,"abstract":"Cyberbullying is the act of threatening or endangering others by posting text or images that humiliate or harass people through the internet or other communication devices. According to a survey from Polling Indonesia and Asosiasi Penyelenggara Jasa Internet Indonesia (APJII) about cyberbullying, 49% of 5900 participants claimed they have been bullied. Therefore, this research was conducted with the intention to prevent cyberbullying acts, especially in Indonesia. We collected data from Twitter based on Twitter’s Trending keywords which correlated to cyberbully events. Then we combined it with the data from previous research. We obtained a total of 1425 tweets, consists of 393 data labeled as cyberbully and 1032 data labeled as non-cyberbully. Thereupon, we build a Doc2Vec model for features extraction, and a classifier model using the baseline classification method (SVM and RF) and CNN to detect cyberbully texts. The results show that the classifier using CNN and Doc2vec has the highest F1-score, 65.08%.","PeriodicalId":191671,"journal":{"name":"2021 9th International Conference on Information and Communication Technology (ICoICT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Cyberbullying Detection on Indonesian Twitter using Doc2Vec and Convolutional Neural Network\",\"authors\":\"Shindy Trimaria Laxmi, Rita Rismala, Hani Nurrahmi\",\"doi\":\"10.1109/ICoICT52021.2021.9527420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cyberbullying is the act of threatening or endangering others by posting text or images that humiliate or harass people through the internet or other communication devices. According to a survey from Polling Indonesia and Asosiasi Penyelenggara Jasa Internet Indonesia (APJII) about cyberbullying, 49% of 5900 participants claimed they have been bullied. Therefore, this research was conducted with the intention to prevent cyberbullying acts, especially in Indonesia. We collected data from Twitter based on Twitter’s Trending keywords which correlated to cyberbully events. Then we combined it with the data from previous research. We obtained a total of 1425 tweets, consists of 393 data labeled as cyberbully and 1032 data labeled as non-cyberbully. Thereupon, we build a Doc2Vec model for features extraction, and a classifier model using the baseline classification method (SVM and RF) and CNN to detect cyberbully texts. The results show that the classifier using CNN and Doc2vec has the highest F1-score, 65.08%.\",\"PeriodicalId\":191671,\"journal\":{\"name\":\"2021 9th International Conference on Information and Communication Technology (ICoICT)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 9th International Conference on Information and Communication Technology (ICoICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICoICT52021.2021.9527420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT52021.2021.9527420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cyberbullying Detection on Indonesian Twitter using Doc2Vec and Convolutional Neural Network
Cyberbullying is the act of threatening or endangering others by posting text or images that humiliate or harass people through the internet or other communication devices. According to a survey from Polling Indonesia and Asosiasi Penyelenggara Jasa Internet Indonesia (APJII) about cyberbullying, 49% of 5900 participants claimed they have been bullied. Therefore, this research was conducted with the intention to prevent cyberbullying acts, especially in Indonesia. We collected data from Twitter based on Twitter’s Trending keywords which correlated to cyberbully events. Then we combined it with the data from previous research. We obtained a total of 1425 tweets, consists of 393 data labeled as cyberbully and 1032 data labeled as non-cyberbully. Thereupon, we build a Doc2Vec model for features extraction, and a classifier model using the baseline classification method (SVM and RF) and CNN to detect cyberbully texts. The results show that the classifier using CNN and Doc2vec has the highest F1-score, 65.08%.