Mohamed Aly Bouke, Azizol Abdullah, Mohd Taufik Abdullah, S. Zaid, Hayate El Atigh, Sameer Hamoud Alshatebi
{"title":"A Lightweight Machine Learning-Based Email Spam Detection Model Using Word Frequency Pattern","authors":"Mohamed Aly Bouke, Azizol Abdullah, Mohd Taufik Abdullah, S. Zaid, Hayate El Atigh, Sameer Hamoud Alshatebi","doi":"10.48185/jitc.v4i1.653","DOIUrl":null,"url":null,"abstract":"This Spam emails have become a severe challenge that irritates and consumes recipients' time. On the one hand, existing spam detection techniques have low detection rates and cannot tolerate high-dimensional data. Moreover, due to the machine learning algorithm's effectiveness in identifying mail as solicited or unsolicited, their approaches have become common in spam detection systems. This paper proposes a lightweight machine learning-based spam detection model based on Random Forest (RF) algorithm. According to the empirical results, the proposed model achieved a 97% accuracy on the spambase dataset. Furthermore, the performance of the proposed model was evaluated using standard classification metrics such as Fscore, Recall, Precision, and Accuracy. The comparison of Our model with state-of-the-art works investigated in this paper showed the model performs better, with an improvement of 6% for all metrics.","PeriodicalId":236819,"journal":{"name":"Journal of Information Technology and Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Technology and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48185/jitc.v4i1.653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This Spam emails have become a severe challenge that irritates and consumes recipients' time. On the one hand, existing spam detection techniques have low detection rates and cannot tolerate high-dimensional data. Moreover, due to the machine learning algorithm's effectiveness in identifying mail as solicited or unsolicited, their approaches have become common in spam detection systems. This paper proposes a lightweight machine learning-based spam detection model based on Random Forest (RF) algorithm. According to the empirical results, the proposed model achieved a 97% accuracy on the spambase dataset. Furthermore, the performance of the proposed model was evaluated using standard classification metrics such as Fscore, Recall, Precision, and Accuracy. The comparison of Our model with state-of-the-art works investigated in this paper showed the model performs better, with an improvement of 6% for all metrics.