{"title":"An application of machine learning to detect abusive Bengali text","authors":"S. C. Eshan, M. S. Hasan","doi":"10.1109/ICCITECHN.2017.8281787","DOIUrl":null,"url":null,"abstract":"Bengali abusive text detection can be useful to prevent cyberbullying and online harassment as these types of crimes are increasing rapidly in Bangladesh. Machine learning approach can be useful to keep the system always updated with the new types of approaches used by the abusers. This paper investigates machine learning algorithms e.g. Random Forest, Multinomial Naïve Bayes, Support Vector Machine (SVM) with Linear, Radial Basis Function (RBF), Polynomial and Sigmoid kernel and have compared with unigram, bigram and trigram based CountVectorizer and TfidfVectorizer features. The results show that SVM Linear kernel performs the best with trigram TfidfVectorizer features.","PeriodicalId":350374,"journal":{"name":"2017 20th International Conference of Computer and Information Technology (ICCIT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 20th International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2017.8281787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61
Abstract
Bengali abusive text detection can be useful to prevent cyberbullying and online harassment as these types of crimes are increasing rapidly in Bangladesh. Machine learning approach can be useful to keep the system always updated with the new types of approaches used by the abusers. This paper investigates machine learning algorithms e.g. Random Forest, Multinomial Naïve Bayes, Support Vector Machine (SVM) with Linear, Radial Basis Function (RBF), Polynomial and Sigmoid kernel and have compared with unigram, bigram and trigram based CountVectorizer and TfidfVectorizer features. The results show that SVM Linear kernel performs the best with trigram TfidfVectorizer features.