平台无关的反垃圾邮件网关

Yihe Zhang, Xu Yuan, N. Tzeng
{"title":"平台无关的反垃圾邮件网关","authors":"Yihe Zhang, Xu Yuan, N. Tzeng","doi":"10.1145/3485832.3488024","DOIUrl":null,"url":null,"abstract":"This paper addresses a novel anti-spam gateway targeting multiple linguistic-based social platforms to expose the outlier property of their spam messages uniformly for effective detection. Instead of labeling ground truth datasets and extracting key features, which are labor-intensive and time-consuming, we start with coarsely mining seed corpora of spams and hams from the target data (aiming for spam classification), before reconstructing them as the reference. To catch each word’s rich information in the semantic and syntactic perspectives, we then leverage the natural language processing (NLP) model to embed each word into the high-dimensional vector space and use a neural network to train a spam word model. After that, each message is encoded by using the predicted spam scores from this model for all included stem words. The encoded messages are processed by the prominent outlier techniques to produce their respective scores, allowing us to rank them for making the outlier visible. Our solution is unsupervised, without relying on specifics of any platform or dataset, to be platform-oblivious. Through extensive experiments, our solution is demonstrated to expose spammers’ outlier characteristics effectively, outperform all examined unsupervised methods in almost all metrics, and may even better supervised counterparts.","PeriodicalId":175869,"journal":{"name":"Annual Computer Security Applications Conference","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Platform-Oblivious Anti-Spam Gateway\",\"authors\":\"Yihe Zhang, Xu Yuan, N. Tzeng\",\"doi\":\"10.1145/3485832.3488024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses a novel anti-spam gateway targeting multiple linguistic-based social platforms to expose the outlier property of their spam messages uniformly for effective detection. Instead of labeling ground truth datasets and extracting key features, which are labor-intensive and time-consuming, we start with coarsely mining seed corpora of spams and hams from the target data (aiming for spam classification), before reconstructing them as the reference. To catch each word’s rich information in the semantic and syntactic perspectives, we then leverage the natural language processing (NLP) model to embed each word into the high-dimensional vector space and use a neural network to train a spam word model. After that, each message is encoded by using the predicted spam scores from this model for all included stem words. The encoded messages are processed by the prominent outlier techniques to produce their respective scores, allowing us to rank them for making the outlier visible. Our solution is unsupervised, without relying on specifics of any platform or dataset, to be platform-oblivious. Through extensive experiments, our solution is demonstrated to expose spammers’ outlier characteristics effectively, outperform all examined unsupervised methods in almost all metrics, and may even better supervised counterparts.\",\"PeriodicalId\":175869,\"journal\":{\"name\":\"Annual Computer Security Applications Conference\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Computer Security Applications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3485832.3488024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Computer Security Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485832.3488024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文提出了一种针对多语言社交平台的新型反垃圾邮件网关,以统一暴露其垃圾邮件的离群值属性,从而实现有效的检测。代替标记地面真实数据集和提取关键特征,这是劳动密集型和耗时的,我们从目标数据中粗略地挖掘垃圾邮件和火腿的种子语料库(旨在进行垃圾邮件分类),然后重建它们作为参考。为了从语义和句法的角度捕捉每个词的丰富信息,我们利用自然语言处理(NLP)模型将每个词嵌入到高维向量空间中,并使用神经网络训练垃圾词模型。之后,使用该模型预测的所有包含的词干词的垃圾邮件分数对每条消息进行编码。编码后的消息由杰出的离群值技术处理,以产生各自的分数,允许我们对它们进行排序,使离群值可见。我们的解决方案是无监督的,不依赖于任何平台或数据集的细节,是平台无关的。通过大量的实验,我们的解决方案被证明可以有效地暴露垃圾邮件发送者的异常特征,在几乎所有指标上都优于所有经过检查的无监督方法,甚至可能更好地监督对应方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Platform-Oblivious Anti-Spam Gateway
This paper addresses a novel anti-spam gateway targeting multiple linguistic-based social platforms to expose the outlier property of their spam messages uniformly for effective detection. Instead of labeling ground truth datasets and extracting key features, which are labor-intensive and time-consuming, we start with coarsely mining seed corpora of spams and hams from the target data (aiming for spam classification), before reconstructing them as the reference. To catch each word’s rich information in the semantic and syntactic perspectives, we then leverage the natural language processing (NLP) model to embed each word into the high-dimensional vector space and use a neural network to train a spam word model. After that, each message is encoded by using the predicted spam scores from this model for all included stem words. The encoded messages are processed by the prominent outlier techniques to produce their respective scores, allowing us to rank them for making the outlier visible. Our solution is unsupervised, without relying on specifics of any platform or dataset, to be platform-oblivious. Through extensive experiments, our solution is demonstrated to expose spammers’ outlier characteristics effectively, outperform all examined unsupervised methods in almost all metrics, and may even better supervised counterparts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信