E-WebGuard：用于精确检测网络攻击的增强型神经架构

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-09-23 DOI:10.1016/j.cose.2024.104127

Luchen Zhou , Wei-Chuen Yau , Y.S. Gan , Sze-Teng Liong

{"title":"E-WebGuard：用于精确检测网络攻击的增强型神经架构","authors":"Luchen Zhou , Wei-Chuen Yau , Y.S. Gan , Sze-Teng Liong","doi":"10.1016/j.cose.2024.104127","DOIUrl":null,"url":null,"abstract":"<div><div>Web applications have become a favored tool for organizations to disseminate vast amounts of information to the public. With the increasing adoption and inherent openness of these applications, there is an observed surge in web-based attacks exploited by adversaries. However, most of the web attack detection works are based on public datasets that are outdated or do not cover a sufficient quantity of web application attacks. Furthermore, most of them are binary detection (i.e., normal or attack) and there is little work on multi-class web attack detection. This highlights the crucial need for automated web attack detection models to bolster web security. In this study, a suite of integrated machine learning and deep learning models is designed to detect web attacks. Specifically, this study employs the Character-level Support Vector Machine (Char-SVM), Character-level Long Short-Term Memory (Char-LSTM), Convolutional Neural Network - SVM (CNN-SVM), and CNN-Bi-LSTM models to differentiate between standard HTTP requests and HTTP-based attacks in both the CSIC 2010 and SR-BH 2020 datasets. Note that the CSIC 2010 dataset involves binary classification, while the SR-BH 2020 dataset involves multi-class classification, specifically with 13 classes. Notably, the input data is first converted to the character level before being fed into any of the proposed model architectures. In the binary classification task, the Char-SVM model with a linear kernel outperforms other models, achieving an accuracy rate of 99.60%. The CNN-Bi-LSTM model closely follows with a 99.41% accuracy, surpassing the performance of the CNN-LSTM model presented in previous research. In the context of multi-class classification, the CNN-Bi-LSTM model demonstrates outstanding performance with a 99.63% accuracy rate. Furthermore, the multi-class classification models, namely Char-LSTM and CNN-Bi-LSTM, achieve validation accuracies above 98%, outperforming the two machine learning-based methods mentioned in the original research.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104127"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"E-WebGuard: Enhanced neural architectures for precision web attack detection\",\"authors\":\"Luchen Zhou , Wei-Chuen Yau , Y.S. Gan , Sze-Teng Liong\",\"doi\":\"10.1016/j.cose.2024.104127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Web applications have become a favored tool for organizations to disseminate vast amounts of information to the public. With the increasing adoption and inherent openness of these applications, there is an observed surge in web-based attacks exploited by adversaries. However, most of the web attack detection works are based on public datasets that are outdated or do not cover a sufficient quantity of web application attacks. Furthermore, most of them are binary detection (i.e., normal or attack) and there is little work on multi-class web attack detection. This highlights the crucial need for automated web attack detection models to bolster web security. In this study, a suite of integrated machine learning and deep learning models is designed to detect web attacks. Specifically, this study employs the Character-level Support Vector Machine (Char-SVM), Character-level Long Short-Term Memory (Char-LSTM), Convolutional Neural Network - SVM (CNN-SVM), and CNN-Bi-LSTM models to differentiate between standard HTTP requests and HTTP-based attacks in both the CSIC 2010 and SR-BH 2020 datasets. Note that the CSIC 2010 dataset involves binary classification, while the SR-BH 2020 dataset involves multi-class classification, specifically with 13 classes. Notably, the input data is first converted to the character level before being fed into any of the proposed model architectures. In the binary classification task, the Char-SVM model with a linear kernel outperforms other models, achieving an accuracy rate of 99.60%. The CNN-Bi-LSTM model closely follows with a 99.41% accuracy, surpassing the performance of the CNN-LSTM model presented in previous research. In the context of multi-class classification, the CNN-Bi-LSTM model demonstrates outstanding performance with a 99.63% accuracy rate. Furthermore, the multi-class classification models, namely Char-LSTM and CNN-Bi-LSTM, achieve validation accuracies above 98%, outperforming the two machine learning-based methods mentioned in the original research.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"148 \",\"pages\":\"Article 104127\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004322\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004322","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

网络应用程序已成为企业向公众传播大量信息的首选工具。随着这些应用的日益普及和固有的开放性，我们观察到被对手利用的基于网络的攻击激增。然而，大多数网络攻击检测工作都是基于过时的公共数据集，或者没有涵盖足够数量的网络应用程序攻击。此外，大多数检测都是二元检测（即正常或攻击），很少有关于多类网络攻击检测的工作。这凸显了对自动网络攻击检测模型的迫切需要，以加强网络安全。本研究设计了一套集成机器学习和深度学习模型来检测网络攻击。具体来说，本研究采用了字符级支持向量机（Char-SVM）、字符级长短期记忆（Char-LSTM）、卷积神经网络-SVM（CNN-SVM）和 CNN-Bi-LSTM 模型来区分 CSIC 2010 和 SR-BH 2020 数据集中的标准 HTTP 请求和基于 HTTP 的攻击。请注意，CSIC 2010 数据集涉及二元分类，而 SR-BH 2020 数据集涉及多类分类，特别是 13 类。值得注意的是，在将输入数据输入到任何建议的模型架构之前，首先要将其转换为字符级。在二元分类任务中，采用线性核的 Char-SVM 模型优于其他模型，准确率达到 99.60%。CNN-Bi-LSTM 模型紧随其后，准确率达到 99.41%，超过了之前研究中 CNN-LSTM 模型的表现。在多类分类方面，CNN-Bi-LSTM 模型表现突出，准确率达到 99.63%。此外，多类分类模型（即 Char-LSTM 和 CNN-Bi-LSTM）的验证准确率超过 98%，优于原始研究中提到的两种基于机器学习的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

E-WebGuard: Enhanced neural architectures for precision web attack detection

Web applications have become a favored tool for organizations to disseminate vast amounts of information to the public. With the increasing adoption and inherent openness of these applications, there is an observed surge in web-based attacks exploited by adversaries. However, most of the web attack detection works are based on public datasets that are outdated or do not cover a sufficient quantity of web application attacks. Furthermore, most of them are binary detection (i.e., normal or attack) and there is little work on multi-class web attack detection. This highlights the crucial need for automated web attack detection models to bolster web security. In this study, a suite of integrated machine learning and deep learning models is designed to detect web attacks. Specifically, this study employs the Character-level Support Vector Machine (Char-SVM), Character-level Long Short-Term Memory (Char-LSTM), Convolutional Neural Network - SVM (CNN-SVM), and CNN-Bi-LSTM models to differentiate between standard HTTP requests and HTTP-based attacks in both the CSIC 2010 and SR-BH 2020 datasets. Note that the CSIC 2010 dataset involves binary classification, while the SR-BH 2020 dataset involves multi-class classification, specifically with 13 classes. Notably, the input data is first converted to the character level before being fed into any of the proposed model architectures. In the binary classification task, the Char-SVM model with a linear kernel outperforms other models, achieving an accuracy rate of 99.60%. The CNN-Bi-LSTM model closely follows with a 99.41% accuracy, surpassing the performance of the CNN-LSTM model presented in previous research. In the context of multi-class classification, the CNN-Bi-LSTM model demonstrates outstanding performance with a 99.63% accuracy rate. Furthermore, the multi-class classification models, namely Char-LSTM and CNN-Bi-LSTM, achieve validation accuracies above 98%, outperforming the two machine learning-based methods mentioned in the original research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.