Detecting spam tweets using machine learning and effective preprocessing

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Pub Date : 2021-11-08 DOI:10.1145/3487351.3490968

Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj

{"title":"Detecting spam tweets using machine learning and effective preprocessing","authors":"Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj","doi":"10.1145/3487351.3490968","DOIUrl":null,"url":null,"abstract":"Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3490968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.

查看原文本刊更多论文

使用机器学习和有效的预处理来检测垃圾推文

如今，随着在线社交网络(OSNs)的迅速普及，这些平台成为垃圾邮件发送者的理想场所。不幸的是，这些垃圾邮件发送者可以很容易地利用osn发布恶意内容，宣传网络钓鱼骗局。因此，对垃圾推文进行有效的识别和过滤，对osn和用户都是有利的。然而，由于这种巨大的帖子流量，检查和消除垃圾推文变得越来越困难。基于这些观察结果，在本文中，我们提出了一种使用机器学习和有效预处理技术检测垃圾推文的方法。该方法提出了各种预处理技术的优点以及哪种预处理技术是最有效的。为了比较这些技术，在测试中使用了UtkML Twitter垃圾邮件数据集。在确定了最有效的方法后，将它们结合起来，可以更好地优化垃圾推文的检测精度。我们用四种不同的机器学习算法来评估我们的解决方案，即Naïve贝叶斯分类器、神经网络、逻辑回归和支持向量机。使用SVM分类器，我们能够达到93.02%的准确率。实验结果表明，该方法可以有效地提高垃圾推文分类的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

自引率

0.00%

发文量