Adaptive spam filtering using dynamic feature space

Yan Zhou, M. Mulekar, Praveen Nerellapalli
{"title":"Adaptive spam filtering using dynamic feature space","authors":"Yan Zhou, M. Mulekar, Praveen Nerellapalli","doi":"10.1109/ICTAI.2005.28","DOIUrl":null,"url":null,"abstract":"Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of e-mail collections over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed data distribution. We compare our technique to several existing off-line learning techniques including support vector machine, naive Bayes, k-nearest neighbor, C4.5 decision tree, RBFNetwork, boosted decision tree and stacking, and demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available","PeriodicalId":294694,"journal":{"name":"17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2005.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

Abstract

Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of e-mail collections over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed data distribution. We compare our technique to several existing off-line learning techniques including support vector machine, naive Bayes, k-nearest neighbor, C4.5 decision tree, RBFNetwork, boosted decision tree and stacking, and demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available
使用动态特征空间的自适应垃圾邮件过滤
未经请求的大量电子邮件,也被称为垃圾邮件,已经成为电子邮件社会日益严重的问题。本文提出了一种新的垃圾邮件过滤策略,1)使用实用的熵编码技术,霍夫曼编码,随着时间的推移动态编码电子邮件集合的特征空间;2)应用在线算法,自适应地增强学习到的垃圾邮件概念,当新的电子邮件数据可用时。这项工作的贡献包括一种高效的垃圾邮件过滤算法,该算法将输入空间从根本上简化为单维输入向量,以及一种自适应学习技术,该技术对词汇变化、概念漂移和倾斜数据分布具有鲁棒性。我们将我们的技术与现有的几种离线学习技术进行了比较,包括支持向量机、朴素贝叶斯、k近邻、C4.5决策树、RBFNetwork、增强决策树和堆叠,并通过在公开可用的电子邮件数据上展示实验结果来证明我们技术的有效性
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信