{"title":"Toxic Comment Classification on Social Media Using Support Vector Machine and Chi Square Feature Selection","authors":"N. Azzahra, D. Murdiansyah, K. Lhaksmana","doi":"10.21108/ijoict.v7i1.552","DOIUrl":null,"url":null,"abstract":"The use of social media in society continues to increase over time and the ease of access and familiarity of social media then make it easier for an irresponsible user to do unethical things such as spreading hatred, defamation, radicalism, pornography so on. Although there are regulations that govern all the activities on social media. However, the regulations are still not working effectively. In this study, we conducted a classification of toxic comments containing unethical matters using the SVM method with TF-IDF as the feature extraction and Chi Square as the feature selection. The best performance result based on the experiment that has been carried out is by using the SVM model with a linear kernel, without implementing Chi Square, and using stemming and stopwords removal with the F1 − Score equal to 76.57%.","PeriodicalId":137090,"journal":{"name":"International Journal on Information and Communication Technology (IJoICT)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Information and Communication Technology (IJoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21108/ijoict.v7i1.552","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The use of social media in society continues to increase over time and the ease of access and familiarity of social media then make it easier for an irresponsible user to do unethical things such as spreading hatred, defamation, radicalism, pornography so on. Although there are regulations that govern all the activities on social media. However, the regulations are still not working effectively. In this study, we conducted a classification of toxic comments containing unethical matters using the SVM method with TF-IDF as the feature extraction and Chi Square as the feature selection. The best performance result based on the experiment that has been carried out is by using the SVM model with a linear kernel, without implementing Chi Square, and using stemming and stopwords removal with the F1 − Score equal to 76.57%.