基于集成学习和SMOTE的恶意短信检测提高移动网络安全

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2025-03-31 DOI:10.1016/j.cose.2025.104443

Hongsheng Xu , Akeel Qadir , Saima Sadiq

{"title":"基于集成学习和SMOTE的恶意短信检测提高移动网络安全","authors":"Hongsheng Xu , Akeel Qadir , Saima Sadiq","doi":"10.1016/j.cose.2025.104443","DOIUrl":null,"url":null,"abstract":"<div><div>The widespread use of cell phones, along with their constant internet connection, makes them vulnerable to malicious SMS attacks, including smishing and spam. Smishing involves attempts to steal personal information, while spam focuses on unwanted advertisements. Both pose cybersecurity threats, often requiring effective filtering techniques. Researchers have devised multiple methods for detecting malicious SMS, yet a notable gap remains in creating algorithms to reduce false positives, where normal messages are wrongly classified as malicious. The method employs ensemble learning to automatically identify malicious or legitimate messages. It combines Support Vector Machine and Random Forest models, compared with individual machine learning approaches for smishing detection. Feature extraction methods like Term Frequency (TF) and Term Frequency–Inverse Document Frequency (TF–IDF) are employed to derive features from the data. The imbalanced issue of the dataset is addressed by applying the Synthetic Minority Oversampling Technique (SMOTE). The results showed that the ensemble model outperformed the individual models, with an accuracy score of 99.58% when trained using TF–IDF on the balanced dataset. The proposed approach offers proactive defense against malicious SMS attacks, enhancing cybersecurity in the mobile communications sector.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"154 ","pages":"Article 104443"},"PeriodicalIF":4.8000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Malicious SMS detection using ensemble learning and SMOTE to improve mobile cybersecurity\",\"authors\":\"Hongsheng Xu , Akeel Qadir , Saima Sadiq\",\"doi\":\"10.1016/j.cose.2025.104443\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The widespread use of cell phones, along with their constant internet connection, makes them vulnerable to malicious SMS attacks, including smishing and spam. Smishing involves attempts to steal personal information, while spam focuses on unwanted advertisements. Both pose cybersecurity threats, often requiring effective filtering techniques. Researchers have devised multiple methods for detecting malicious SMS, yet a notable gap remains in creating algorithms to reduce false positives, where normal messages are wrongly classified as malicious. The method employs ensemble learning to automatically identify malicious or legitimate messages. It combines Support Vector Machine and Random Forest models, compared with individual machine learning approaches for smishing detection. Feature extraction methods like Term Frequency (TF) and Term Frequency–Inverse Document Frequency (TF–IDF) are employed to derive features from the data. The imbalanced issue of the dataset is addressed by applying the Synthetic Minority Oversampling Technique (SMOTE). The results showed that the ensemble model outperformed the individual models, with an accuracy score of 99.58% when trained using TF–IDF on the balanced dataset. The proposed approach offers proactive defense against malicious SMS attacks, enhancing cybersecurity in the mobile communications sector.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"154 \",\"pages\":\"Article 104443\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825001324\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825001324","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

手机的广泛使用，以及它们持续的互联网连接，使它们容易受到恶意短信攻击，包括诈骗和垃圾邮件。Smishing指的是试图窃取个人信息，而spam指的是不想要的广告。两者都构成网络安全威胁，通常需要有效的过滤技术。研究人员已经设计了多种检测恶意短信的方法，但在创建减少误报的算法方面仍然存在显著差距，误报是指正常短信被错误地归类为恶意短信。该方法采用集成学习来自动识别恶意或合法消息。它结合了支持向量机和随机森林模型，与单个机器学习方法相比，用于欺骗检测。采用词频（Term Frequency， TF）和词频-逆文档频率（Term Frequency - inverse Document Frequency, TF - idf）等特征提取方法从数据中提取特征。采用合成少数派过采样技术（SMOTE）解决了数据集的不平衡问题。结果表明，在平衡数据集上使用TF-IDF训练时，集成模型的准确率达到99.58%，优于单个模型。提出的方法提供了针对恶意短信攻击的主动防御，增强了移动通信领域的网络安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Malicious SMS detection using ensemble learning and SMOTE to improve mobile cybersecurity

The widespread use of cell phones, along with their constant internet connection, makes them vulnerable to malicious SMS attacks, including smishing and spam. Smishing involves attempts to steal personal information, while spam focuses on unwanted advertisements. Both pose cybersecurity threats, often requiring effective filtering techniques. Researchers have devised multiple methods for detecting malicious SMS, yet a notable gap remains in creating algorithms to reduce false positives, where normal messages are wrongly classified as malicious. The method employs ensemble learning to automatically identify malicious or legitimate messages. It combines Support Vector Machine and Random Forest models, compared with individual machine learning approaches for smishing detection. Feature extraction methods like Term Frequency (TF) and Term Frequency–Inverse Document Frequency (TF–IDF) are employed to derive features from the data. The imbalanced issue of the dataset is addressed by applying the Synthetic Minority Oversampling Technique (SMOTE). The results showed that the ensemble model outperformed the individual models, with an accuracy score of 99.58% when trained using TF–IDF on the balanced dataset. The proposed approach offers proactive defense against malicious SMS attacks, enhancing cybersecurity in the mobile communications sector.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.