基于ID3算法和隐马尔可夫模型的垃圾邮件检测

V. Kumar, Monika, Parveen Kumar, Ambalika Sharma
{"title":"基于ID3算法和隐马尔可夫模型的垃圾邮件检测","authors":"V. Kumar, Monika, Parveen Kumar, Ambalika Sharma","doi":"10.1109/INFOCOMTECH.2018.8722378","DOIUrl":null,"url":null,"abstract":"Emails are the way to communicate over the Internet but this method of communication is bothersome by the Spam emails. Spam emails are the waste of memory, money, time and communication bandwidth. Thus, Spam emails needed to be identified and culminated. Hence, use of the ID3 algorithm for making the decision trees and the Hidden Markov Model for calculating the probabilities of the events that may occur is used in this paper as a combination to identify the emails as Spam or ham. The model labels the emails as Spam or ham by calculating total probability of an email using all posteriorly classified words in emails and then supervising all processed emails by making their decision trees. For this purpose, an Enron dataset of 5172 emails is used that contains 2086 Spam and 2086 ham pre-classified emails. The experimental result on the given dataset shows that an accuracy of 89% is obtained on the Spam emails.","PeriodicalId":175757,"journal":{"name":"2018 Conference on Information and Communication Technology (CICT)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Spam Email Detection using ID3 Algorithm and Hidden Markov Model\",\"authors\":\"V. Kumar, Monika, Parveen Kumar, Ambalika Sharma\",\"doi\":\"10.1109/INFOCOMTECH.2018.8722378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emails are the way to communicate over the Internet but this method of communication is bothersome by the Spam emails. Spam emails are the waste of memory, money, time and communication bandwidth. Thus, Spam emails needed to be identified and culminated. Hence, use of the ID3 algorithm for making the decision trees and the Hidden Markov Model for calculating the probabilities of the events that may occur is used in this paper as a combination to identify the emails as Spam or ham. The model labels the emails as Spam or ham by calculating total probability of an email using all posteriorly classified words in emails and then supervising all processed emails by making their decision trees. For this purpose, an Enron dataset of 5172 emails is used that contains 2086 Spam and 2086 ham pre-classified emails. The experimental result on the given dataset shows that an accuracy of 89% is obtained on the Spam emails.\",\"PeriodicalId\":175757,\"journal\":{\"name\":\"2018 Conference on Information and Communication Technology (CICT)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Conference on Information and Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOMTECH.2018.8722378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Conference on Information and Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOMTECH.2018.8722378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

电子邮件是通过互联网进行通信的方式,但这种通信方式被垃圾邮件所困扰。垃圾邮件是对记忆、金钱、时间和通信带宽的浪费。因此,垃圾邮件需要被识别和终结。因此,本文将使用ID3算法制作决策树和隐马尔可夫模型计算可能发生的事件的概率作为组合来识别垃圾邮件或火腿。该模型通过使用邮件中所有后分类词计算邮件的总概率,然后通过制定决策树来监督所有处理过的邮件,从而将邮件标记为Spam或ham。为此,使用了包含5172封电子邮件的安然数据集,其中包含2086封垃圾邮件和2086封普通预分类电子邮件。在给定数据集上的实验结果表明,该方法对垃圾邮件的识别准确率达到89%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spam Email Detection using ID3 Algorithm and Hidden Markov Model
Emails are the way to communicate over the Internet but this method of communication is bothersome by the Spam emails. Spam emails are the waste of memory, money, time and communication bandwidth. Thus, Spam emails needed to be identified and culminated. Hence, use of the ID3 algorithm for making the decision trees and the Hidden Markov Model for calculating the probabilities of the events that may occur is used in this paper as a combination to identify the emails as Spam or ham. The model labels the emails as Spam or ham by calculating total probability of an email using all posteriorly classified words in emails and then supervising all processed emails by making their decision trees. For this purpose, an Enron dataset of 5172 emails is used that contains 2086 Spam and 2086 ham pre-classified emails. The experimental result on the given dataset shows that an accuracy of 89% is obtained on the Spam emails.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信