{"title":"Spam Email Detection using ID3 Algorithm and Hidden Markov Model","authors":"V. Kumar, Monika, Parveen Kumar, Ambalika Sharma","doi":"10.1109/INFOCOMTECH.2018.8722378","DOIUrl":null,"url":null,"abstract":"Emails are the way to communicate over the Internet but this method of communication is bothersome by the Spam emails. Spam emails are the waste of memory, money, time and communication bandwidth. Thus, Spam emails needed to be identified and culminated. Hence, use of the ID3 algorithm for making the decision trees and the Hidden Markov Model for calculating the probabilities of the events that may occur is used in this paper as a combination to identify the emails as Spam or ham. The model labels the emails as Spam or ham by calculating total probability of an email using all posteriorly classified words in emails and then supervising all processed emails by making their decision trees. For this purpose, an Enron dataset of 5172 emails is used that contains 2086 Spam and 2086 ham pre-classified emails. The experimental result on the given dataset shows that an accuracy of 89% is obtained on the Spam emails.","PeriodicalId":175757,"journal":{"name":"2018 Conference on Information and Communication Technology (CICT)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Conference on Information and Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOMTECH.2018.8722378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Emails are the way to communicate over the Internet but this method of communication is bothersome by the Spam emails. Spam emails are the waste of memory, money, time and communication bandwidth. Thus, Spam emails needed to be identified and culminated. Hence, use of the ID3 algorithm for making the decision trees and the Hidden Markov Model for calculating the probabilities of the events that may occur is used in this paper as a combination to identify the emails as Spam or ham. The model labels the emails as Spam or ham by calculating total probability of an email using all posteriorly classified words in emails and then supervising all processed emails by making their decision trees. For this purpose, an Enron dataset of 5172 emails is used that contains 2086 Spam and 2086 ham pre-classified emails. The experimental result on the given dataset shows that an accuracy of 89% is obtained on the Spam emails.