基于朴素贝叶斯和粒子群优化(PSO)的垃圾邮件检测

Muhamad Abdul Ghani, Hamdun Sulaiman
{"title":"基于朴素贝叶斯和粒子群优化(PSO)的垃圾邮件检测","authors":"Muhamad Abdul Ghani, Hamdun Sulaiman","doi":"10.29408/jit.v6i1.7049","DOIUrl":null,"url":null,"abstract":"Internet-based technology has become a primary need. Based on the survey results from the Central Statistics Agency in collaboration with APJII, email sending and receiving activities have outperformed social media positions by reaching 95.75%. Very intense use of email can have both positive and negative effects. Because apart from being a communication tool, in reality not everyone uses email well and there are even so many misuses of email that have the potential to harm others. This misused email is commonly known as spam or junkmail (junk email) which contains advertisements, scams and even viruses. In this study, data processing from gmail emails with text mining was carried out and then tested with several data mining classification methods including the Naïve Bayes Algorithm, SVM, Random Forest and combined with Partical Swarm Optimization in predicting spam emails with the aim that the selected algorithm is the most accurate. From the test results by measuring the performance of the four algorithms using Confusion Matrix and ROC, it is known that the Naïve Bayes algorithm with Partical Swarm Optimization (PSO) has the highest accuracy value, namely 81.40% and AUC 0.78","PeriodicalId":13567,"journal":{"name":"Infotek : Jurnal Informatika dan Teknologi","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deteksi Spam Email dengan Metode Naive Bayes dan Particle Swarm Optimization (PSO)\",\"authors\":\"Muhamad Abdul Ghani, Hamdun Sulaiman\",\"doi\":\"10.29408/jit.v6i1.7049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Internet-based technology has become a primary need. Based on the survey results from the Central Statistics Agency in collaboration with APJII, email sending and receiving activities have outperformed social media positions by reaching 95.75%. Very intense use of email can have both positive and negative effects. Because apart from being a communication tool, in reality not everyone uses email well and there are even so many misuses of email that have the potential to harm others. This misused email is commonly known as spam or junkmail (junk email) which contains advertisements, scams and even viruses. In this study, data processing from gmail emails with text mining was carried out and then tested with several data mining classification methods including the Naïve Bayes Algorithm, SVM, Random Forest and combined with Partical Swarm Optimization in predicting spam emails with the aim that the selected algorithm is the most accurate. From the test results by measuring the performance of the four algorithms using Confusion Matrix and ROC, it is known that the Naïve Bayes algorithm with Partical Swarm Optimization (PSO) has the highest accuracy value, namely 81.40% and AUC 0.78\",\"PeriodicalId\":13567,\"journal\":{\"name\":\"Infotek : Jurnal Informatika dan Teknologi\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infotek : Jurnal Informatika dan Teknologi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29408/jit.v6i1.7049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infotek : Jurnal Informatika dan Teknologi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29408/jit.v6i1.7049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

基于互联网的技术已经成为首要需求。根据中央统计局与APJII合作的调查结果,电子邮件发送和接收活动的表现优于社交媒体职位,达到95.75%。过度使用电子邮件既有积极的影响,也有消极的影响。因为除了作为一种沟通工具之外,在现实中并不是每个人都能很好地使用电子邮件,甚至有很多滥用电子邮件的情况,这些都有可能伤害他人。这种被滥用的电子邮件通常被称为垃圾邮件或垃圾邮件(垃圾邮件),其中包含广告,诈骗甚至病毒。本研究对gmail邮件进行文本挖掘数据处理,并结合Naïve贝叶斯算法、支持向量机、随机森林等几种数据挖掘分类方法,结合粒子群优化算法对垃圾邮件进行预测,力求所选算法的准确率最高。通过混淆矩阵和ROC对四种算法性能的测试结果可知,Naïve结合粒子群优化(PSO)的Bayes算法准确率最高,为81.40%,AUC为0.78
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deteksi Spam Email dengan Metode Naive Bayes dan Particle Swarm Optimization (PSO)
Internet-based technology has become a primary need. Based on the survey results from the Central Statistics Agency in collaboration with APJII, email sending and receiving activities have outperformed social media positions by reaching 95.75%. Very intense use of email can have both positive and negative effects. Because apart from being a communication tool, in reality not everyone uses email well and there are even so many misuses of email that have the potential to harm others. This misused email is commonly known as spam or junkmail (junk email) which contains advertisements, scams and even viruses. In this study, data processing from gmail emails with text mining was carried out and then tested with several data mining classification methods including the Naïve Bayes Algorithm, SVM, Random Forest and combined with Partical Swarm Optimization in predicting spam emails with the aim that the selected algorithm is the most accurate. From the test results by measuring the performance of the four algorithms using Confusion Matrix and ROC, it is known that the Naïve Bayes algorithm with Partical Swarm Optimization (PSO) has the highest accuracy value, namely 81.40% and AUC 0.78
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信