{"title":"Research and Improvement of a Spam Filter Based on Naive Bayes","authors":"Lin Li, Chi Li","doi":"10.1109/IHMSC.2015.208","DOIUrl":null,"url":null,"abstract":"The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.","PeriodicalId":6592,"journal":{"name":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"61 1","pages":"361-364"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2015.208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.