基于朴素贝叶斯的垃圾邮件过滤器的研究与改进

2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics Pub Date : 2015-11-23 DOI:10.1109/IHMSC.2015.208

Lin Li, Chi Li

{"title":"基于朴素贝叶斯的垃圾邮件过滤器的研究与改进","authors":"Lin Li, Chi Li","doi":"10.1109/IHMSC.2015.208","DOIUrl":null,"url":null,"abstract":"The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.","PeriodicalId":6592,"journal":{"name":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"61 1","pages":"361-364"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Research and Improvement of a Spam Filter Based on Naive Bayes\",\"authors\":\"Lin Li, Chi Li\",\"doi\":\"10.1109/IHMSC.2015.208\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.\",\"PeriodicalId\":6592,\"journal\":{\"name\":\"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"volume\":\"61 1\",\"pages\":\"361-364\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHMSC.2015.208\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2015.208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

基于朴素贝叶斯算法的垃圾邮件过滤器具有良好的分类准确率，但邮件样本集的训练和学习耗费大量资源，影响了系统的整体效率，因此在实际应用中应选择邮件文本的特征，从而降低特征向量空间的维数。TF-IDF是常用的文本特征选择，方法简单，本文改进了TF-IDF特征选择的IDF加权算法，增加其类对应的高频词的权重，使用改进的TF-IDF算法来选择特征，并构建了一个改进的朴素贝叶斯垃圾邮件过滤器TF-IDF特征加权。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research and Improvement of a Spam Filter Based on Naive Bayes

The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics

自引率

0.00%

发文量