Detecting spam tweets using machine learning and effective preprocessing

Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj
{"title":"Detecting spam tweets using machine learning and effective preprocessing","authors":"Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj","doi":"10.1145/3487351.3490968","DOIUrl":null,"url":null,"abstract":"Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3490968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.
使用机器学习和有效的预处理来检测垃圾推文
如今,随着在线社交网络(OSNs)的迅速普及,这些平台成为垃圾邮件发送者的理想场所。不幸的是,这些垃圾邮件发送者可以很容易地利用osn发布恶意内容,宣传网络钓鱼骗局。因此,对垃圾推文进行有效的识别和过滤,对osn和用户都是有利的。然而,由于这种巨大的帖子流量,检查和消除垃圾推文变得越来越困难。基于这些观察结果,在本文中,我们提出了一种使用机器学习和有效预处理技术检测垃圾推文的方法。该方法提出了各种预处理技术的优点以及哪种预处理技术是最有效的。为了比较这些技术,在测试中使用了UtkML Twitter垃圾邮件数据集。在确定了最有效的方法后,将它们结合起来,可以更好地优化垃圾推文的检测精度。我们用四种不同的机器学习算法来评估我们的解决方案,即Naïve贝叶斯分类器、神经网络、逻辑回归和支持向量机。使用SVM分类器,我们能够达到93.02%的准确率。实验结果表明,该方法可以有效地提高垃圾推文分类的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信