Sexual Harassment Detection using Machine Learning and Deep Learning Techniques for Bangla Text

Mujahidul Islam, Maqsudur Rahman, M. T. Ahmed, Abu Zafor Muhammad Islam, Dipankar Das, M. M. Hoque
{"title":"Sexual Harassment Detection using Machine Learning and Deep Learning Techniques for Bangla Text","authors":"Mujahidul Islam, Maqsudur Rahman, M. T. Ahmed, Abu Zafor Muhammad Islam, Dipankar Das, M. M. Hoque","doi":"10.1109/ECCE57851.2023.10101522","DOIUrl":null,"url":null,"abstract":"Harassment is a kind of act that annoys or upsets someone. Harassment can be classified into different categories. Sexual harassment is one of them. Sexual harassment is a type of harassment that involves the use of implicit or explicit sexual overtones, including the inappropriate and unwelcome promises of rewards in exchange for sexual favors. At present time, the technology has become more advance and spread all over the place. That gave the toxic people a huge opportunity to spread toxicity in online platforms. Because of the increasing amount Bangla text in different social media platforms, we also need to filter such kinds of offensive Bangla texts. The objective of this research is to detect sexual harassment from Bangla text and classify them by using machine learning and deep learning algorithms as well as prevents them. In the experiment, we combined TF-IDF with different machine learning algorithms like Naive Bayes, Decision Tree, Random Forest, AdaBoost, SGD, Logistic Regression, KNN, SVM and got accuracy of 74.9%, 75.6%, 70.0%, 70.1%, 75.2%, 75.7%, 65.2%, 76.5% respectively. Deep learning algorithms like CNN, LSTM, hybrid CNN-LSTM were also used and achieved accuracy of 89% for all of them which is comparatively better than machine learning techniques.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Harassment is a kind of act that annoys or upsets someone. Harassment can be classified into different categories. Sexual harassment is one of them. Sexual harassment is a type of harassment that involves the use of implicit or explicit sexual overtones, including the inappropriate and unwelcome promises of rewards in exchange for sexual favors. At present time, the technology has become more advance and spread all over the place. That gave the toxic people a huge opportunity to spread toxicity in online platforms. Because of the increasing amount Bangla text in different social media platforms, we also need to filter such kinds of offensive Bangla texts. The objective of this research is to detect sexual harassment from Bangla text and classify them by using machine learning and deep learning algorithms as well as prevents them. In the experiment, we combined TF-IDF with different machine learning algorithms like Naive Bayes, Decision Tree, Random Forest, AdaBoost, SGD, Logistic Regression, KNN, SVM and got accuracy of 74.9%, 75.6%, 70.0%, 70.1%, 75.2%, 75.7%, 65.2%, 76.5% respectively. Deep learning algorithms like CNN, LSTM, hybrid CNN-LSTM were also used and achieved accuracy of 89% for all of them which is comparatively better than machine learning techniques.
使用机器学习和深度学习技术对孟加拉文本进行性骚扰检测
骚扰是一种使某人烦恼或不安的行为。骚扰可以分为不同的类别。性骚扰就是其中之一。性骚扰是一种涉及使用隐性或显性性暗示的骚扰,包括不适当和不受欢迎的奖励承诺以换取性利益。目前,该技术已经变得更加先进,并遍布各地。这给了“毒”人们在网络平台上传播“毒”的巨大机会。由于在不同的社交媒体平台上孟加拉语文本的数量越来越多,我们也需要过滤这种冒犯性的孟加拉语文本。本研究的目的是通过机器学习和深度学习算法从孟加拉语文本中检测性骚扰,并对其进行分类,同时防止性骚扰。在实验中,我们将TF-IDF与朴素贝叶斯、决策树、随机森林、AdaBoost、SGD、Logistic回归、KNN、SVM等不同的机器学习算法相结合,准确率分别为74.9%、75.6%、70.0%、70.1%、75.2%、75.7%、65.2%、76.5%。还使用了CNN、LSTM、CNN-LSTM混合算法等深度学习算法,均达到89%的准确率,相对优于机器学习技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信