{"title":"Spam filtering techniques and MapReduce with SVM: A study","authors":"Amol G. Kakade, P. Kharat, A. Gupta, Tarun Batra","doi":"10.1109/APCASE.2014.6924472","DOIUrl":null,"url":null,"abstract":"Spam is the most dangerous threat to email systems today. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam filtering techniques, Support Vector Machine (SVM) training problems and need to introduce MapReduce Hadoop to train SVM. Techniques to separate spam mails are word based, content based, machine learning based and hybrid. Machine learning techniques are most popular because of high accuracy and mathematical support. SVM is the mostly used machine learning based technique in the spam filtering process because its ability to handle data with large attribute. Hurdles in training of SVM are, large time requirement and large dataset can't be given as an input. These both problems can be solved by implementing the training algorithm on MapReduce (Hadoop) framework which gives up to 6 times speedup than sequential algorithm.","PeriodicalId":118511,"journal":{"name":"2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCASE.2014.6924472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Spam is the most dangerous threat to email systems today. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam filtering techniques, Support Vector Machine (SVM) training problems and need to introduce MapReduce Hadoop to train SVM. Techniques to separate spam mails are word based, content based, machine learning based and hybrid. Machine learning techniques are most popular because of high accuracy and mathematical support. SVM is the mostly used machine learning based technique in the spam filtering process because its ability to handle data with large attribute. Hurdles in training of SVM are, large time requirement and large dataset can't be given as an input. These both problems can be solved by implementing the training algorithm on MapReduce (Hadoop) framework which gives up to 6 times speedup than sequential algorithm.