利用电话号码来检测在线社交网络中的垃圾邮件

2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI) Pub Date : 2021-01-21 DOI:10.1109/SAMI50585.2021.9378644

R. Jere, Anant Pandey, Manvi Singh, Mandar Ganjapurkar

{"title":"利用电话号码来检测在线社交网络中的垃圾邮件","authors":"R. Jere, Anant Pandey, Manvi Singh, Mandar Ganjapurkar","doi":"10.1109/SAMI50585.2021.9378644","DOIUrl":null,"url":null,"abstract":"Online Social Networks (OSNs) are platforms that have gained immense traction from society today. Social media has reshaped our social world and has been playing a pivotal role in sculpting our personal and professional goals. While it provides invaluable information to millions of individuals daily, it has also become one of the most popular places for spam campaigns. In this paper, we design an algorithm for the recognition of spam campaigns, specifically focusing on a phone-numbers based approach. We build a system for spam campaign recognition with an emphasis on phone numbers in the light of the malicious activity that is vandalizing our online experience. This research focuses on data extracted from monitoring the following social networking channels: Tumblr, Twitter, and Flickr. The paper serves as an analytical lens for spam posts accumulated over four months. Regular expressions are used for data cleaning to identify posts containing phone numbers. We collected over 18 million spam posts and filtered the spam-containing posts using regular expressions. Next, we used a Bayesian Model called Latent Dirichlet Allocation (LDA) to perform a statistical model for detecting the category of the posts. We further use the bag-of-words and the tf-idf means to this data and apply cosine similarity for the similarity measure.","PeriodicalId":402414,"journal":{"name":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Leveraging Phone Numbers for Spam detection in Online Social Networks\",\"authors\":\"R. Jere, Anant Pandey, Manvi Singh, Mandar Ganjapurkar\",\"doi\":\"10.1109/SAMI50585.2021.9378644\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online Social Networks (OSNs) are platforms that have gained immense traction from society today. Social media has reshaped our social world and has been playing a pivotal role in sculpting our personal and professional goals. While it provides invaluable information to millions of individuals daily, it has also become one of the most popular places for spam campaigns. In this paper, we design an algorithm for the recognition of spam campaigns, specifically focusing on a phone-numbers based approach. We build a system for spam campaign recognition with an emphasis on phone numbers in the light of the malicious activity that is vandalizing our online experience. This research focuses on data extracted from monitoring the following social networking channels: Tumblr, Twitter, and Flickr. The paper serves as an analytical lens for spam posts accumulated over four months. Regular expressions are used for data cleaning to identify posts containing phone numbers. We collected over 18 million spam posts and filtered the spam-containing posts using regular expressions. Next, we used a Bayesian Model called Latent Dirichlet Allocation (LDA) to perform a statistical model for detecting the category of the posts. We further use the bag-of-words and the tf-idf means to this data and apply cosine similarity for the similarity measure.\",\"PeriodicalId\":402414,\"journal\":{\"name\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"volume\":\"142 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAMI50585.2021.9378644\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI50585.2021.9378644","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在线社交网络(OSNs)是当今社会获得巨大吸引力的平台。社交媒体重塑了我们的社交世界，并在塑造我们的个人和职业目标方面发挥着关键作用。虽然它每天为数百万个人提供宝贵的信息，但它也成为垃圾邮件活动最受欢迎的地方之一。在本文中，我们设计了一种识别垃圾邮件活动的算法，特别关注基于电话号码的方法。我们建立了一个系统的垃圾邮件活动识别，重点是电话号码的恶意活动，破坏了我们的在线体验。这项研究的重点是监测以下社交网络渠道提取的数据:Tumblr, Twitter和Flickr。该报纸是对4个多月来积累的垃圾邮件进行分析的透镜。正则表达式用于数据清理，以识别包含电话号码的帖子。我们收集了超过1800万篇垃圾邮件，并使用正则表达式过滤了包含垃圾邮件的帖子。接下来，我们使用一种称为潜狄利克雷分配(Latent Dirichlet Allocation, LDA)的贝叶斯模型来执行一个统计模型，用于检测帖子的类别。我们进一步对该数据使用词袋和tf-idf方法，并对相似性度量应用余弦相似度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Leveraging Phone Numbers for Spam detection in Online Social Networks

Online Social Networks (OSNs) are platforms that have gained immense traction from society today. Social media has reshaped our social world and has been playing a pivotal role in sculpting our personal and professional goals. While it provides invaluable information to millions of individuals daily, it has also become one of the most popular places for spam campaigns. In this paper, we design an algorithm for the recognition of spam campaigns, specifically focusing on a phone-numbers based approach. We build a system for spam campaign recognition with an emphasis on phone numbers in the light of the malicious activity that is vandalizing our online experience. This research focuses on data extracted from monitoring the following social networking channels: Tumblr, Twitter, and Flickr. The paper serves as an analytical lens for spam posts accumulated over four months. Regular expressions are used for data cleaning to identify posts containing phone numbers. We collected over 18 million spam posts and filtered the spam-containing posts using regular expressions. Next, we used a Bayesian Model called Latent Dirichlet Allocation (LDA) to perform a statistical model for detecting the category of the posts. We further use the bag-of-words and the tf-idf means to this data and apply cosine similarity for the similarity measure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)

自引率

0.00%

发文量