{"title":"垃圾邮件过滤技术与支持向量机MapReduce研究","authors":"Amol G. Kakade, P. Kharat, A. Gupta, Tarun Batra","doi":"10.1109/APCASE.2014.6924472","DOIUrl":null,"url":null,"abstract":"Spam is the most dangerous threat to email systems today. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam filtering techniques, Support Vector Machine (SVM) training problems and need to introduce MapReduce Hadoop to train SVM. Techniques to separate spam mails are word based, content based, machine learning based and hybrid. Machine learning techniques are most popular because of high accuracy and mathematical support. SVM is the mostly used machine learning based technique in the spam filtering process because its ability to handle data with large attribute. Hurdles in training of SVM are, large time requirement and large dataset can't be given as an input. These both problems can be solved by implementing the training algorithm on MapReduce (Hadoop) framework which gives up to 6 times speedup than sequential algorithm.","PeriodicalId":118511,"journal":{"name":"2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Spam filtering techniques and MapReduce with SVM: A study\",\"authors\":\"Amol G. Kakade, P. Kharat, A. Gupta, Tarun Batra\",\"doi\":\"10.1109/APCASE.2014.6924472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spam is the most dangerous threat to email systems today. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam filtering techniques, Support Vector Machine (SVM) training problems and need to introduce MapReduce Hadoop to train SVM. Techniques to separate spam mails are word based, content based, machine learning based and hybrid. Machine learning techniques are most popular because of high accuracy and mathematical support. SVM is the mostly used machine learning based technique in the spam filtering process because its ability to handle data with large attribute. Hurdles in training of SVM are, large time requirement and large dataset can't be given as an input. These both problems can be solved by implementing the training algorithm on MapReduce (Hadoop) framework which gives up to 6 times speedup than sequential algorithm.\",\"PeriodicalId\":118511,\"journal\":{\"name\":\"2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APCASE.2014.6924472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCASE.2014.6924472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spam filtering techniques and MapReduce with SVM: A study
Spam is the most dangerous threat to email systems today. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam filtering techniques, Support Vector Machine (SVM) training problems and need to introduce MapReduce Hadoop to train SVM. Techniques to separate spam mails are word based, content based, machine learning based and hybrid. Machine learning techniques are most popular because of high accuracy and mathematical support. SVM is the mostly used machine learning based technique in the spam filtering process because its ability to handle data with large attribute. Hurdles in training of SVM are, large time requirement and large dataset can't be given as an input. These both problems can be solved by implementing the training algorithm on MapReduce (Hadoop) framework which gives up to 6 times speedup than sequential algorithm.