在合法批量消息时代检测垃圾短信

Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks Pub Date : 2016-07-18 DOI:10.1145/2939918.2939937

Bradley Reaves, Logan Blue, D. Tian, Patrick Traynor, Kevin R. B. Butler

{"title":"在合法批量消息时代检测垃圾短信","authors":"Bradley Reaves, Logan Blue, D. Tian, Patrick Traynor, Kevin R. B. Butler","doi":"10.1145/2939918.2939937","DOIUrl":null,"url":null,"abstract":"Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.","PeriodicalId":387704,"journal":{"name":"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Detecting SMS Spam in the Age of Legitimate Bulk Messaging\",\"authors\":\"Bradley Reaves, Logan Blue, D. Tian, Patrick Traynor, Kevin R. B. Butler\",\"doi\":\"10.1145/2939918.2939937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.\",\"PeriodicalId\":387704,\"journal\":{\"name\":\"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2939918.2939937\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2939918.2939937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

世界上使用短信的人比使用其他任何通信技术的人都多。因此，它为垃圾邮件发送者提供了理想的媒介。虽然这个问题已经被许多研究人员研究了多年，但最近合法的批量流量(例如，帐户验证，2FA等)的增加极大地改变了这个领域中看到的流量组合，降低了以前垃圾邮件分类工作的有效性。本文演示了在包含大量和垃圾消息的大规模文本消息语料库上使用这些检测器时的性能下降。对于我们收集的超过14个月的标记短信数据集，过去分类器的准确率和召回率分别下降到23.8%和61.3%。然而，使用我们的分类技术和标记聚类，准确率和召回率分别提高到100%和96.8%。我们不仅表明我们收集的数据集有助于纠正以前研究中看到的许多过度训练错误，而且还提供了对当前SMS垃圾邮件活动的一些见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting SMS Spam in the Age of Legitimate Bulk Messaging

Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks

自引率

0.00%

发文量