IGXSS: XSS payload detection model based on inductive GCN

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Network Management Pub Date : 2024-02-11 DOI:10.1002/nem.2264

Qiuhua Wang, Chuangchuang Li, Dong Wang, Lifeng Yuan, Gaoning Pan, Yanyu Cheng, Mingde Hu, Yizhi Ren

{"title":"IGXSS: XSS payload detection model based on inductive GCN","authors":"Qiuhua Wang, Chuangchuang Li, Dong Wang, Lifeng Yuan, Gaoning Pan, Yanyu Cheng, Mingde Hu, Yizhi Ren","doi":"10.1002/nem.2264","DOIUrl":null,"url":null,"abstract":"<p>To facilitate the management, Internet of Things (IoT) vendors usually apply remote ways such as HTTP services to uniformly manage IoT devices, leading to traditional web application vulnerabilities that also endanger the cloud interfaces of IoT, such as cross-site scripting (XSS), code injection, and Remote Command/Code Execute (RCE). XSS is one of the most common web application attacks, which allows the attacker to obtain private user information or attack IoT devices and IoT cloud platforms. Most of the existing XSS payload detection models are based on machine learning or deep learning, which usually require a lot of external resources, such as pretrained word vectors, to achieve a better performance on unknown samples. But in the field of XSS payload detection, high-quality vector representations of samples are often difficult to obtain. In addition, existing models all perform substantially worse when the distribution of XSS payloads and benign samples in the test dataset is extremely unbalanced (e.g., XSS payloads: benign samples = 1: 20). While in the real XSS attack scenario against IoT, an XSS payload is often hidden in a massive amount of normal user requests, indicating that these models are not practical. In response to the above issues, we propose an XSS payload detection model based on inductive graph neural networks, IGXSS (XSS payload detection model based on inductive GCN), to detect XSS payloads targeting IoT. Firstly, we treat the samples and words obtained from segmenting the samples as nodes and attach lines between them in order to form a graph. Then, we obtain the feature matrix of nodes and edges utilizing information between nodes only (instead of external resources such as pretrained word vectors). Finally, we feed the obtained feature matrix into a two-layer GCN for training and validate the performance of models in several datasets with different sample distributions. Extensive experiments on the real datasets show that IGXSS performs better compared to other models under various sample distributions. In particular, when the sample distribution is extremely unbalanced, the recall and F1 score of IGXSS still reach 1.000 and 0.846, demonstrating that IGXSS is more robust and more suitable for practical scenarios.</p>","PeriodicalId":14154,"journal":{"name":"International Journal of Network Management","volume":"34 6","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Network Management","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/nem.2264","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

To facilitate the management, Internet of Things (IoT) vendors usually apply remote ways such as HTTP services to uniformly manage IoT devices, leading to traditional web application vulnerabilities that also endanger the cloud interfaces of IoT, such as cross-site scripting (XSS), code injection, and Remote Command/Code Execute (RCE). XSS is one of the most common web application attacks, which allows the attacker to obtain private user information or attack IoT devices and IoT cloud platforms. Most of the existing XSS payload detection models are based on machine learning or deep learning, which usually require a lot of external resources, such as pretrained word vectors, to achieve a better performance on unknown samples. But in the field of XSS payload detection, high-quality vector representations of samples are often difficult to obtain. In addition, existing models all perform substantially worse when the distribution of XSS payloads and benign samples in the test dataset is extremely unbalanced (e.g., XSS payloads: benign samples = 1: 20). While in the real XSS attack scenario against IoT, an XSS payload is often hidden in a massive amount of normal user requests, indicating that these models are not practical. In response to the above issues, we propose an XSS payload detection model based on inductive graph neural networks, IGXSS (XSS payload detection model based on inductive GCN), to detect XSS payloads targeting IoT. Firstly, we treat the samples and words obtained from segmenting the samples as nodes and attach lines between them in order to form a graph. Then, we obtain the feature matrix of nodes and edges utilizing information between nodes only (instead of external resources such as pretrained word vectors). Finally, we feed the obtained feature matrix into a two-layer GCN for training and validate the performance of models in several datasets with different sample distributions. Extensive experiments on the real datasets show that IGXSS performs better compared to other models under various sample distributions. In particular, when the sample distribution is extremely unbalanced, the recall and F1 score of IGXSS still reach 1.000 and 0.846, demonstrating that IGXSS is more robust and more suitable for practical scenarios.

Abstract Image

查看原文本刊更多论文

IGXSS：基于感应式 GCN 的 XSS 有效载荷检测模型

为方便管理，物联网（IoT）厂商通常采用 HTTP 服务等远程方式统一管理物联网设备，导致传统的 Web 应用程序漏洞也危及物联网云接口，如跨站脚本（XSS）、代码注入和远程命令/代码执行（RCE）等。XSS 是最常见的网络应用程序攻击之一，攻击者可借此获取用户隐私信息或攻击物联网设备和物联网云平台。现有的 XSS 有效载荷检测模型大多基于机器学习或深度学习，通常需要大量外部资源（如预训练的词向量）才能在未知样本上取得更好的性能。但在 XSS 有效载荷检测领域，通常很难获得高质量的样本向量表示。此外，当测试数据集中 XSS 有效载荷和良性样本的分布极不平衡时（例如，XSS 有效载荷：良性样本 = 1:20），现有模型的性能都会大大降低。而在针对物联网的真实 XSS 攻击场景中，XSS 有效载荷往往隐藏在大量正常用户请求中，这表明这些模型并不实用。针对上述问题，我们提出了一种基于归纳图神经网络的 XSS 有效载荷检测模型 IGXSS（基于归纳图神经网络的 XSS 有效载荷检测模型），用于检测针对物联网的 XSS 有效载荷。首先，我们将样本和样本分割后得到的单词视为节点，并在它们之间添加线段以形成图。然后，我们仅利用节点之间的信息（而不是预训练词向量等外部资源）获得节点和边的特征矩阵。最后，我们将获得的特征矩阵输入双层 GCN 进行训练，并在多个具有不同样本分布的数据集上验证模型的性能。在真实数据集上进行的大量实验表明，IGXSS 在各种样本分布情况下的表现都优于其他模型。特别是在样本分布极不平衡的情况下，IGXSS 的召回率和 F1 得分仍能达到 1.000 和 0.846，这表明 IGXSS 更稳健，更适合实际应用场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Network Management COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS

CiteScore

5.10

自引率

6.70%

发文量

审稿时长

>12 weeks

期刊介绍： Modern computer networks and communication systems are increasing in size, scope, and heterogeneity. The promise of a single end-to-end technology has not been realized and likely never will occur. The decreasing cost of bandwidth is increasing the possible applications of computer networks and communication systems to entirely new domains. Problems in integrating heterogeneous wired and wireless technologies, ensuring security and quality of service, and reliably operating large-scale systems including the inclusion of cloud computing have all emerged as important topics. The one constant is the need for network management. Challenges in network management have never been greater than they are today. The International Journal of Network Management is the forum for researchers, developers, and practitioners in network management to present their work to an international audience. The journal is dedicated to the dissemination of information, which will enable improved management, operation, and maintenance of computer networks and communication systems. The journal is peer reviewed and publishes original papers (both theoretical and experimental) by leading researchers, practitioners, and consultants from universities, research laboratories, and companies around the world. Issues with thematic or guest-edited special topics typically occur several times per year. Topic areas for the journal are largely defined by the taxonomy for network and service management developed by IFIP WG6.6, together with IEEE-CNOM, the IRTF-NMRG and the Emanics Network of Excellence.