{"title":"基于复杂网络的垃圾短信过滤模型","authors":"Shaghayegh Hosseinpour, Hadi Shakibian","doi":"10.1016/j.comnet.2024.110892","DOIUrl":null,"url":null,"abstract":"<div><div>With the advancement of technology and the widespread use of mobile phones and wireless communication, SMS has become the most popular texting method due to its high response rate, affordability, and no internet connection requirement. A survey found that 3.5 billion users, or 80% of active users worldwide, use SMS for communication. SMS, however, has also attracted spammers, resulting in an explosion in spam messages, especially in Asia. Users are annoyed, lose money, and waste their time by receiving spam messages intended to serve various purposes, such as advertising, adult content, smishing, and fraud. Spam messages are a problem for users and providers, which calls for a mechanism to identify and filter them out. With supervised machine learning techniques, we propose a novel approach to classify spam and ham messages based on complex network theory. The proposed approach integrates complex network based features with statistical TF-IDF and grammatical rules features. Also, an under-sampling method has been employed in order to cope with the imbalanced data issue. We evaluated the performance of several supervised learners in terms of accuracy, precision, recall, F1-score, and AUC. In our experiments, Random Forest successfully classified spam messages more accurate than statistical methods that only extracted TF-IDF features.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"255 ","pages":"Article 110892"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Complex-network based model for SMS spam filtering\",\"authors\":\"Shaghayegh Hosseinpour, Hadi Shakibian\",\"doi\":\"10.1016/j.comnet.2024.110892\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the advancement of technology and the widespread use of mobile phones and wireless communication, SMS has become the most popular texting method due to its high response rate, affordability, and no internet connection requirement. A survey found that 3.5 billion users, or 80% of active users worldwide, use SMS for communication. SMS, however, has also attracted spammers, resulting in an explosion in spam messages, especially in Asia. Users are annoyed, lose money, and waste their time by receiving spam messages intended to serve various purposes, such as advertising, adult content, smishing, and fraud. Spam messages are a problem for users and providers, which calls for a mechanism to identify and filter them out. With supervised machine learning techniques, we propose a novel approach to classify spam and ham messages based on complex network theory. The proposed approach integrates complex network based features with statistical TF-IDF and grammatical rules features. Also, an under-sampling method has been employed in order to cope with the imbalanced data issue. We evaluated the performance of several supervised learners in terms of accuracy, precision, recall, F1-score, and AUC. In our experiments, Random Forest successfully classified spam messages more accurate than statistical methods that only extracted TF-IDF features.</div></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":\"255 \",\"pages\":\"Article 110892\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128624007242\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624007242","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Complex-network based model for SMS spam filtering
With the advancement of technology and the widespread use of mobile phones and wireless communication, SMS has become the most popular texting method due to its high response rate, affordability, and no internet connection requirement. A survey found that 3.5 billion users, or 80% of active users worldwide, use SMS for communication. SMS, however, has also attracted spammers, resulting in an explosion in spam messages, especially in Asia. Users are annoyed, lose money, and waste their time by receiving spam messages intended to serve various purposes, such as advertising, adult content, smishing, and fraud. Spam messages are a problem for users and providers, which calls for a mechanism to identify and filter them out. With supervised machine learning techniques, we propose a novel approach to classify spam and ham messages based on complex network theory. The proposed approach integrates complex network based features with statistical TF-IDF and grammatical rules features. Also, an under-sampling method has been employed in order to cope with the imbalanced data issue. We evaluated the performance of several supervised learners in terms of accuracy, precision, recall, F1-score, and AUC. In our experiments, Random Forest successfully classified spam messages more accurate than statistical methods that only extracted TF-IDF features.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.