Azeema Sadia, Fatima Bashir, Reema Qaiser Khan, Ammarah Khalid
{"title":"Comparison of Machine Learning Algorithms for Spam Detection","authors":"Azeema Sadia, Fatima Bashir, Reema Qaiser Khan, Ammarah Khalid","doi":"10.12720/jait.14.2.178-184","DOIUrl":null,"url":null,"abstract":"—The Internet is used as a tool to offer people with endless knowledge. It is a global platform which is used for connectivity, communication, and sharing. At almost no cost, an individual can use the Internet to send email messages, update tweets, and Facebook messages to a vast number of people. These messages can also contain unsolicited advertisement which is identified as a spam. The company Twitter too is massively affected by spamming and it is an alarming issue for them. Twitter considers spam as actions that are unsolicited and repeated. These include tweet repetition, and the URLs that lead users to completely unrelated websites. The authors’ have worked with twitter’s dataset focusing on tweets about “iPhone”. It was collected by using an API which was further pre-processed. In this paper, content-based features have been selected that recognize the spamming tweet by using R. Multiple machine learning algorithms were applied to detect spamming tweets: Naive Bayes, Logistic Regression, KNN, Decision Tree, and Support Vector Machine. It was observed that the best performance was achieved by Naive Bayes Algorithm giving an accuracy of 89%.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.2.178-184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
—The Internet is used as a tool to offer people with endless knowledge. It is a global platform which is used for connectivity, communication, and sharing. At almost no cost, an individual can use the Internet to send email messages, update tweets, and Facebook messages to a vast number of people. These messages can also contain unsolicited advertisement which is identified as a spam. The company Twitter too is massively affected by spamming and it is an alarming issue for them. Twitter considers spam as actions that are unsolicited and repeated. These include tweet repetition, and the URLs that lead users to completely unrelated websites. The authors’ have worked with twitter’s dataset focusing on tweets about “iPhone”. It was collected by using an API which was further pre-processed. In this paper, content-based features have been selected that recognize the spamming tweet by using R. Multiple machine learning algorithms were applied to detect spamming tweets: Naive Bayes, Logistic Regression, KNN, Decision Tree, and Support Vector Machine. It was observed that the best performance was achieved by Naive Bayes Algorithm giving an accuracy of 89%.