在合法批量消息时代检测垃圾短信

Bradley Reaves, Logan Blue, D. Tian, Patrick Traynor, Kevin R. B. Butler
{"title":"在合法批量消息时代检测垃圾短信","authors":"Bradley Reaves, Logan Blue, D. Tian, Patrick Traynor, Kevin R. B. Butler","doi":"10.1145/2939918.2939937","DOIUrl":null,"url":null,"abstract":"Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.","PeriodicalId":387704,"journal":{"name":"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Detecting SMS Spam in the Age of Legitimate Bulk Messaging\",\"authors\":\"Bradley Reaves, Logan Blue, D. Tian, Patrick Traynor, Kevin R. B. Butler\",\"doi\":\"10.1145/2939918.2939937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.\",\"PeriodicalId\":387704,\"journal\":{\"name\":\"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2939918.2939937\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2939918.2939937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

世界上使用短信的人比使用其他任何通信技术的人都多。因此,它为垃圾邮件发送者提供了理想的媒介。虽然这个问题已经被许多研究人员研究了多年,但最近合法的批量流量(例如,帐户验证,2FA等)的增加极大地改变了这个领域中看到的流量组合,降低了以前垃圾邮件分类工作的有效性。本文演示了在包含大量和垃圾消息的大规模文本消息语料库上使用这些检测器时的性能下降。对于我们收集的超过14个月的标记短信数据集,过去分类器的准确率和召回率分别下降到23.8%和61.3%。然而,使用我们的分类技术和标记聚类,准确率和召回率分别提高到100%和96.8%。我们不仅表明我们收集的数据集有助于纠正以前研究中看到的许多过度训练错误,而且还提供了对当前SMS垃圾邮件活动的一些见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting SMS Spam in the Age of Legitimate Bulk Messaging
Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信