DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-10-28 DOI:10.1016/j.cose.2024.104170

Abiodun Esther Omolara , Moatsum Alawida

{"title":"DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security","authors":"Abiodun Esther Omolara , Moatsum Alawida","doi":"10.1016/j.cose.2024.104170","DOIUrl":null,"url":null,"abstract":"<div><div>Over 5.44 billion people now use the Internet, making it a vital part of daily life, enabling communication, e-commerce, education, and more. However, this huge Internet connectivity also raises concerns about online privacy and security, particularly with the rise of malicious Uniform Resource Locators (URLs). Recently, conventional ensemble models have attracted attention due to their notable benefits of reducing the variance in models, enhancing predictive performance, improving prediction accuracy, and demonstrating high generalization potential. But, its application in addressing the challenge of malicious URLs is still an open problem. These URLs often hide behind static links in emails or web pages, posing a threat to individuals and organizations. Despite blacklisting services, many harmful sites evade detection due to inadequate scrutiny or recent creation. Hence, to improve URL detection, a Diverse and Efficient Ensemble (DaE2) machine learning algorithm was developed using four ensemble models, that is, AdaBoost, Bagging, Stacking, and Voting to classify URLs. After preprocessing, the experimental result shown that all models achieved over 80 % accuracy, with AdaBoost reaching 98.5 % and Stacking offering the fastest runtime. AdaBoost and Bagging also delivered strong performance, with F1 scores of 0.980 and 0.976, respectively.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104170"},"PeriodicalIF":4.8000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004759","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Over 5.44 billion people now use the Internet, making it a vital part of daily life, enabling communication, e-commerce, education, and more. However, this huge Internet connectivity also raises concerns about online privacy and security, particularly with the rise of malicious Uniform Resource Locators (URLs). Recently, conventional ensemble models have attracted attention due to their notable benefits of reducing the variance in models, enhancing predictive performance, improving prediction accuracy, and demonstrating high generalization potential. But, its application in addressing the challenge of malicious URLs is still an open problem. These URLs often hide behind static links in emails or web pages, posing a threat to individuals and organizations. Despite blacklisting services, many harmful sites evade detection due to inadequate scrutiny or recent creation. Hence, to improve URL detection, a Diverse and Efficient Ensemble (DaE2) machine learning algorithm was developed using four ensemble models, that is, AdaBoost, Bagging, Stacking, and Voting to classify URLs. After preprocessing, the experimental result shown that all models achieved over 80 % accuracy, with AdaBoost reaching 98.5 % and Stacking offering the fastest runtime. AdaBoost and Bagging also delivered strong performance, with F1 scores of 0.980 and 0.976, respectively.

查看原文本刊更多论文

DaE2：利用多样化和高效的集合机器学习为在线安全揭开恶意 URL 的面纱

目前有超过 54.4 亿人使用互联网，互联网已成为日常生活的重要组成部分，使通信、电子商务、教育等成为可能。然而，巨大的互联网连接也引发了人们对网络隐私和安全的担忧，特别是随着恶意统一资源定位器（URL）的兴起。最近，传统的集合模型因其在减少模型方差、增强预测性能、提高预测准确性和展示高泛化潜力等方面的显著优势而备受关注。但是，它在应对恶意 URL 挑战方面的应用仍是一个未决问题。这些 URL 通常隐藏在电子邮件或网页的静态链接后面，对个人和组织构成威胁。尽管有黑名单服务，但许多有害网站由于审查不充分或最近才创建而逃避检测。因此，为了改进 URL 检测，我们开发了一种多样化高效集合（DaE2）机器学习算法，使用四种集合模型，即 AdaBoost、Bagging、Stacking 和 Voting 来对 URL 进行分类。预处理后的实验结果表明，所有模型的准确率都超过了 80%，其中 AdaBoost 的准确率达到了 98.5%，Stacking 的运行时间最快。AdaBoost 和 Bagging 的性能也很强，F1 分数分别为 0.980 和 0.976。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.