{"title":"Toxicity Detection on Bengali Social Media Comments using Supervised Models","authors":"Nayan Banik, Md. Hasan Hafizur Rahman","doi":"10.1109/ICIET48527.2019.9290710","DOIUrl":null,"url":null,"abstract":"Social media playing an indispensable role in our daily life providing a public platform to share opinions including threats, spam and vulgar words often referred to as toxic comments. This type of expression depicts the anti-social behavior of the commentators which may hamper the online atmosphere. Filtering such toxic comments by handcrafting rules is cumbersome because they are unstructured and often include misspelled obscene words. Automated machine learning-based models to classify such toxic comments constitute a part of Sentiment Analysis and they are extensively used for the English language; showing promising results than statistical models. Though Bengali is a widely spoken language around the globe, little research works have been done to detect toxic comments in this language. Hence in this scholarly manuscript, we provide a comparative analysis of five supervised learning models (Naive Bayes, Support Vector Machines, Logistic Regression, Convolutional Neural Network, and Long Short Term Memory) to detect toxic Bengali comments from an annotated publicly available dataset. As our research finding, we demonstrate that both the deep learning-based models have outperformed other classifiers by 10% margin where Convolutional Neural Network achieved the highest accuracy of 95.30%.","PeriodicalId":427838,"journal":{"name":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Innovation in Engineering and Technology (ICIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIET48527.2019.9290710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Social media playing an indispensable role in our daily life providing a public platform to share opinions including threats, spam and vulgar words often referred to as toxic comments. This type of expression depicts the anti-social behavior of the commentators which may hamper the online atmosphere. Filtering such toxic comments by handcrafting rules is cumbersome because they are unstructured and often include misspelled obscene words. Automated machine learning-based models to classify such toxic comments constitute a part of Sentiment Analysis and they are extensively used for the English language; showing promising results than statistical models. Though Bengali is a widely spoken language around the globe, little research works have been done to detect toxic comments in this language. Hence in this scholarly manuscript, we provide a comparative analysis of five supervised learning models (Naive Bayes, Support Vector Machines, Logistic Regression, Convolutional Neural Network, and Long Short Term Memory) to detect toxic Bengali comments from an annotated publicly available dataset. As our research finding, we demonstrate that both the deep learning-based models have outperformed other classifiers by 10% margin where Convolutional Neural Network achieved the highest accuracy of 95.30%.