Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data

Yuchun Tang, S. Krasser, P. Judge, Yanqing Zhang
{"title":"Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data","authors":"Yuchun Tang, S. Krasser, P. Judge, Yanqing Zhang","doi":"10.1109/COLCOM.2006.361856","DOIUrl":null,"url":null,"abstract":"Unsolicited commercial or bulk emails or emails containing virus currently pose a great threat to the utility of email communications. A recent solution for filtering is reputation systems that can assign a value of trust to each IP address sending email messages. By analyzing the query patterns of each participating node, reputation systems can calculate a reputation score for each queried IP address and serve as a platform for global collaborative spam filtering for all participating nodes. In this research, we explore a behavioral classification approach based on spectral sender characteristics retrieved from such global messaging patterns. Due to the large amount of bad senders, this classification task has to cope with highly imbalanced data. In order to solve this challenging problem, a novel granular support vector machine - boundary alignment algorithm (GSVM-BA) is designed. GSVM-BA looks for the optima] decision boundary by repetitively removing positive support vectors from the training dataset and rebuilding another SVM. Compared to the original SVM algorithm with cost-sensitive learning, GSVM-BA demonstrates superior performance on spam IP detection, in terms of both effectiveness and efficiency","PeriodicalId":315775,"journal":{"name":"2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COLCOM.2006.361856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36

Abstract

Unsolicited commercial or bulk emails or emails containing virus currently pose a great threat to the utility of email communications. A recent solution for filtering is reputation systems that can assign a value of trust to each IP address sending email messages. By analyzing the query patterns of each participating node, reputation systems can calculate a reputation score for each queried IP address and serve as a platform for global collaborative spam filtering for all participating nodes. In this research, we explore a behavioral classification approach based on spectral sender characteristics retrieved from such global messaging patterns. Due to the large amount of bad senders, this classification task has to cope with highly imbalanced data. In order to solve this challenging problem, a novel granular support vector machine - boundary alignment algorithm (GSVM-BA) is designed. GSVM-BA looks for the optima] decision boundary by repetitively removing positive support vectors from the training dataset and rebuilding another SVM. Compared to the original SVM algorithm with cost-sensitive learning, GSVM-BA demonstrates superior performance on spam IP detection, in terms of both effectiveness and efficiency
基于高度不平衡邮件服务器行为数据的支持向量机快速有效检测垃圾邮件发送者
目前,未经请求的商业电子邮件或大量电子邮件或含有病毒的电子邮件对电子邮件通信的效用构成了很大的威胁。最近的过滤解决方案是信誉系统,它可以为发送电子邮件消息的每个IP地址分配信任值。通过分析每个参与节点的查询模式,信誉系统可以为每个查询的IP地址计算信誉分数,并作为所有参与节点的全局协同垃圾邮件过滤平台。在这项研究中,我们探索了一种基于从这种全局消息传递模式中检索到的频谱发送者特征的行为分类方法。由于大量的不良发送者,该分类任务必须处理高度不平衡的数据。为了解决这一具有挑战性的问题,设计了一种新的颗粒支持向量机-边界对齐算法(GSVM-BA)。GSVM-BA通过从训练数据集中重复删除正支持向量并重建另一个支持向量机来寻找最优决策边界。与具有代价敏感学习的原始SVM算法相比,GSVM-BA在垃圾邮件IP检测方面表现出了更高的有效性和效率
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信