利用衍生变量确保网络流量分类模型免受对抗性示例的影响

IF 3.6 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Future Internet Pub Date : 2023-12-16 DOI:10.3390/fi15120405

J. M. Adeke, Guangjie Liu, Junjie Zhao, Nannan Wu, Hafsat Muhammad Bashir

{"title":"利用衍生变量确保网络流量分类模型免受对抗性示例的影响","authors":"J. M. Adeke, Guangjie Liu, Junjie Zhao, Nannan Wu, Hafsat Muhammad Bashir","doi":"10.3390/fi15120405","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) models are essential to securing communication networks. However, these models are vulnerable to adversarial examples (AEs), in which malicious inputs are modified by adversaries to produce the desired output. Adversarial training is an effective defense method against such attacks but relies on access to a substantial number of AEs, a prerequisite that entails significant computational resources and the inherent limitation of poor performance on clean data. To address these problems, this study proposes a novel approach to improve the robustness of ML-based network traffic classification models by integrating derived variables (DVars) into training. Unlike adversarial training, our approach focuses on enhancing training using DVars, introducing randomness into the input data. DVars are generated from the baseline dataset and significantly improve the resilience of the model to AEs. To evaluate the effectiveness of DVars, experiments were conducted using the CSE-CIC-IDS2018 dataset and three state-of-the-art ML-based models: decision tree (DT), random forest (RF), and k-neighbors (KNN). The results show that DVars can improve the accuracy of KNN under attack from 0.45% to 0.84% for low-intensity attacks and from 0.32% to 0.66% for high-intensity attacks. Furthermore, both DT and RF achieve a significant increase in accuracy when subjected to attack of different intensity. Moreover, DVars are computationally efficient, scalable, and do not require access to AEs.","PeriodicalId":37982,"journal":{"name":"Future Internet","volume":"21 3","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Securing Network Traffic Classification Models against Adversarial Examples Using Derived Variables\",\"authors\":\"J. M. Adeke, Guangjie Liu, Junjie Zhao, Nannan Wu, Hafsat Muhammad Bashir\",\"doi\":\"10.3390/fi15120405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) models are essential to securing communication networks. However, these models are vulnerable to adversarial examples (AEs), in which malicious inputs are modified by adversaries to produce the desired output. Adversarial training is an effective defense method against such attacks but relies on access to a substantial number of AEs, a prerequisite that entails significant computational resources and the inherent limitation of poor performance on clean data. To address these problems, this study proposes a novel approach to improve the robustness of ML-based network traffic classification models by integrating derived variables (DVars) into training. Unlike adversarial training, our approach focuses on enhancing training using DVars, introducing randomness into the input data. DVars are generated from the baseline dataset and significantly improve the resilience of the model to AEs. To evaluate the effectiveness of DVars, experiments were conducted using the CSE-CIC-IDS2018 dataset and three state-of-the-art ML-based models: decision tree (DT), random forest (RF), and k-neighbors (KNN). The results show that DVars can improve the accuracy of KNN under attack from 0.45% to 0.84% for low-intensity attacks and from 0.32% to 0.66% for high-intensity attacks. Furthermore, both DT and RF achieve a significant increase in accuracy when subjected to attack of different intensity. Moreover, DVars are computationally efficient, scalable, and do not require access to AEs.\",\"PeriodicalId\":37982,\"journal\":{\"name\":\"Future Internet\",\"volume\":\"21 3\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2023-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Internet\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/fi15120405\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi15120405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

机器学习（ML）模型对于确保通信网络安全至关重要。然而，这些模型很容易受到对抗范例（AE）的攻击，在对抗范例中，恶意输入会被对抗者修改以产生所需的输出。对抗训练是抵御此类攻击的有效方法，但它依赖于对大量 AE 的访问，而访问 AE 的前提条件是需要大量的计算资源，其固有的限制是在干净数据上的性能较差。为了解决这些问题，本研究提出了一种新方法，通过将衍生变量（DVars）整合到训练中来提高基于 ML 的网络流量分类模型的鲁棒性。与对抗训练不同，我们的方法侧重于使用 DVars 增强训练，将随机性引入输入数据。DVars 由基线数据集生成，可显著提高模型对 AE 的适应能力。为了评估 DVars 的有效性，我们使用 CSE-CIC-IDS2018 数据集和三种最先进的基于 ML 的模型进行了实验：决策树（DT）、随机森林（RF）和 KNN（Kneighbors）。结果表明，在低强度攻击下，DVars 可以将 KNN 的准确率从 0.45% 提高到 0.84%；在高强度攻击下，DVars 可以将 KNN 的准确率从 0.32% 提高到 0.66%。此外，当受到不同强度的攻击时，DT 和 RF 都能显著提高准确率。此外，DVars 的计算效率高、可扩展，而且不需要访问 AE。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Securing Network Traffic Classification Models against Adversarial Examples Using Derived Variables

Machine learning (ML) models are essential to securing communication networks. However, these models are vulnerable to adversarial examples (AEs), in which malicious inputs are modified by adversaries to produce the desired output. Adversarial training is an effective defense method against such attacks but relies on access to a substantial number of AEs, a prerequisite that entails significant computational resources and the inherent limitation of poor performance on clean data. To address these problems, this study proposes a novel approach to improve the robustness of ML-based network traffic classification models by integrating derived variables (DVars) into training. Unlike adversarial training, our approach focuses on enhancing training using DVars, introducing randomness into the input data. DVars are generated from the baseline dataset and significantly improve the resilience of the model to AEs. To evaluate the effectiveness of DVars, experiments were conducted using the CSE-CIC-IDS2018 dataset and three state-of-the-art ML-based models: decision tree (DT), random forest (RF), and k-neighbors (KNN). The results show that DVars can improve the accuracy of KNN under attack from 0.45% to 0.84% for low-intensity attacks and from 0.32% to 0.66% for high-intensity attacks. Furthermore, both DT and RF achieve a significant increase in accuracy when subjected to attack of different intensity. Moreover, DVars are computationally efficient, scalable, and do not require access to AEs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Internet Computer Science-Computer Networks and Communications

CiteScore

7.10

自引率

5.90%

发文量

303

审稿时长

11 weeks

期刊介绍： Future Internet is a scholarly open access journal which provides an advanced forum for science and research concerned with evolution of Internet technologies and related smart systems for “Net-Living” development. The general reference subject is therefore the evolution towards the future internet ecosystem, which is feeding a continuous, intensive, artificial transformation of the lived environment, for a widespread and significant improvement of well-being in all spheres of human life (private, public, professional). Included topics are: • advanced communications network infrastructures • evolution of internet basic services • internet of things • netted peripheral sensors • industrial internet • centralized and distributed data centers • embedded computing • cloud computing • software defined network functions and network virtualization • cloud-let and fog-computing • big data, open data and analytical tools • cyber-physical systems • network and distributed operating systems • web services • semantic structures and related software tools • artificial and augmented intelligence • augmented reality • system interoperability and flexible service composition • smart mission-critical system architectures • smart terminals and applications • pro-sumer tools for application design and development • cyber security compliance • privacy compliance • reliability compliance • dependability compliance • accountability compliance • trust compliance • technical quality of basic services.