Baeza-Yates and Navarro approximate string matching for spam filtering

M. Aldwairi, Y. Flaifel
{"title":"Baeza-Yates and Navarro approximate string matching for spam filtering","authors":"M. Aldwairi, Y. Flaifel","doi":"10.1109/INTECH.2012.6457802","DOIUrl":null,"url":null,"abstract":"Spam has evolved in terms of contents, methods, delivery networks and volume. Reports indicate that up to 90% of the World Wide Web email traffic is spam [1]. The contents are covering a wider range and are deviating from the conventional pharmaceuticals and adult content into more formal marketing campaigns. This illegal advertising is evolving into an underground market for bot masters who rent or sell spam agents. Progressively, spam campaigns engage new methods to ensure efficient mass delivery and dodge conventional spam detectors. They employ very complicated and vast infrastructure of Botnets and Fast Flux Networks to deliver as many emails as possible. The main concerns for spam detection process are detection and misclassification accuracies, and those remain a challenge because of the evolving techniques employed by spammers. In this paper we propose a bit-parallel string matching spam filtering system based on the improved Baeza-Yates and Navarro approximate string matching algorithm. This method has a low computational cost, is easy to implement, and has the potential to catch misspelled keywords. The proposed approach achieves 97.2% overall accuracy with a simple Naive Bayes classifier.","PeriodicalId":369113,"journal":{"name":"Second International Conference on the Innovative Computing Technology (INTECH 2012)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Second International Conference on the Innovative Computing Technology (INTECH 2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTECH.2012.6457802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Spam has evolved in terms of contents, methods, delivery networks and volume. Reports indicate that up to 90% of the World Wide Web email traffic is spam [1]. The contents are covering a wider range and are deviating from the conventional pharmaceuticals and adult content into more formal marketing campaigns. This illegal advertising is evolving into an underground market for bot masters who rent or sell spam agents. Progressively, spam campaigns engage new methods to ensure efficient mass delivery and dodge conventional spam detectors. They employ very complicated and vast infrastructure of Botnets and Fast Flux Networks to deliver as many emails as possible. The main concerns for spam detection process are detection and misclassification accuracies, and those remain a challenge because of the evolving techniques employed by spammers. In this paper we propose a bit-parallel string matching spam filtering system based on the improved Baeza-Yates and Navarro approximate string matching algorithm. This method has a low computational cost, is easy to implement, and has the potential to catch misspelled keywords. The proposed approach achieves 97.2% overall accuracy with a simple Naive Bayes classifier.
Baeza-Yates和Navarro近似字符串匹配的垃圾邮件过滤
垃圾邮件在内容、方法、发送网络和数量方面都有所发展。报告表明,高达90%的万维网电子邮件流量是垃圾邮件[1]。内容覆盖范围更广,从传统的药品和成人内容转向更正式的营销活动。这种非法广告正在演变成一个地下市场,机器人主人租用或出售垃圾邮件代理。垃圾邮件活动逐渐采用新方法来确保有效的大量发送并避开传统的垃圾邮件检测器。他们使用非常复杂和庞大的僵尸网络和快速通量网络基础设施来发送尽可能多的电子邮件。垃圾邮件检测过程的主要关注点是检测和错误分类的准确性,由于垃圾邮件发送者使用的技术不断发展,这些仍然是一个挑战。本文提出了一种基于改进的Baeza-Yates和Navarro近似字符串匹配算法的位并行字符串匹配垃圾邮件过滤系统。该方法计算成本低,易于实现,并且有可能捕获拼错的关键字。该方法使用简单的朴素贝叶斯分类器,总体准确率达到97.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信