{"title":"基于贝叶斯信念网络的垃圾邮件过滤:使用相关词","authors":"X. Jin, Anbang Xu, R. Bie, Xian Shen, Min Yin","doi":"10.1109/GRC.2006.1635790","DOIUrl":null,"url":null,"abstract":"In this paper, we report our work on a Bayesian Belief Network approach to spam email filtering (classifying email as spam or nonspam/legitimate). Our evaluation suggests that a Bayesian Belief Network based classifier will outperform the popular Naive Bayes approach and two other famous learners: decision tree and k-NN. These four algorithms are tested on two different data sets with three different feature selection methods (Information Gain, Gain Ratio and Chi Squared) for finding relevant words. 10-fold cross-validation results show that Bayesian Belief Network performs best on both datasets. We suggest that this is because the 'dependant learner' characteristics of Bayesian Belief Network classification are more suited to spam filtering. The performance of the Bayesian Belief Network classifier could be further improved by careful feature subset selection.","PeriodicalId":400997,"journal":{"name":"2006 IEEE International Conference on Granular Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Spam email filtering with bayesian belief network: using relevant words\",\"authors\":\"X. Jin, Anbang Xu, R. Bie, Xian Shen, Min Yin\",\"doi\":\"10.1109/GRC.2006.1635790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we report our work on a Bayesian Belief Network approach to spam email filtering (classifying email as spam or nonspam/legitimate). Our evaluation suggests that a Bayesian Belief Network based classifier will outperform the popular Naive Bayes approach and two other famous learners: decision tree and k-NN. These four algorithms are tested on two different data sets with three different feature selection methods (Information Gain, Gain Ratio and Chi Squared) for finding relevant words. 10-fold cross-validation results show that Bayesian Belief Network performs best on both datasets. We suggest that this is because the 'dependant learner' characteristics of Bayesian Belief Network classification are more suited to spam filtering. The performance of the Bayesian Belief Network classifier could be further improved by careful feature subset selection.\",\"PeriodicalId\":400997,\"journal\":{\"name\":\"2006 IEEE International Conference on Granular Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE International Conference on Granular Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GRC.2006.1635790\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2006.1635790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
摘要
在本文中,我们报告了我们在垃圾邮件过滤的贝叶斯信念网络方法上的工作(将电子邮件分类为垃圾邮件或非垃圾邮件/合法)。我们的评估表明,基于贝叶斯信念网络的分类器将优于流行的朴素贝叶斯方法和另外两个著名的学习器:决策树和k-NN。在两个不同的数据集上用三种不同的特征选择方法(Information Gain, Gain Ratio和Chi Squared)对这四种算法进行了测试,以寻找相关词。10倍交叉验证结果表明,贝叶斯信念网络在两个数据集上都表现最好。我们认为这是因为贝叶斯信念网络分类的“依赖学习者”特征更适合于垃圾邮件过滤。通过仔细选择特征子集,贝叶斯信念网络分类器的性能可以得到进一步提高。
Spam email filtering with bayesian belief network: using relevant words
In this paper, we report our work on a Bayesian Belief Network approach to spam email filtering (classifying email as spam or nonspam/legitimate). Our evaluation suggests that a Bayesian Belief Network based classifier will outperform the popular Naive Bayes approach and two other famous learners: decision tree and k-NN. These four algorithms are tested on two different data sets with three different feature selection methods (Information Gain, Gain Ratio and Chi Squared) for finding relevant words. 10-fold cross-validation results show that Bayesian Belief Network performs best on both datasets. We suggest that this is because the 'dependant learner' characteristics of Bayesian Belief Network classification are more suited to spam filtering. The performance of the Bayesian Belief Network classifier could be further improved by careful feature subset selection.