João Vitorino, Miguel Silva, Eva Maia, Isabel Praça
{"title":"Reliable feature selection for adversarially robust cyber-attack detection","authors":"João Vitorino, Miguel Silva, Eva Maia, Isabel Praça","doi":"10.1007/s12243-024-01047-z","DOIUrl":null,"url":null,"abstract":"<p>The growing cybersecurity threats make it essential to use high-quality data to train machine learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.</p>","PeriodicalId":50761,"journal":{"name":"Annals of Telecommunications","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Telecommunications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12243-024-01047-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The growing cybersecurity threats make it essential to use high-quality data to train machine learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.
网络安全威胁与日俱增,因此必须使用高质量数据来训练用于网络流量分析的机器学习(ML)模型,而不能使用嘈杂或缺失的数据。通过选择与网络攻击检测最相关的特征,可以提高网络安全系统所用模型的鲁棒性和计算效率。本作品介绍了一种结合多种方法的特征选择和共识流程,并将其应用于多个网络数据集。我们选择了两种不同的特征集,并将其用于训练常规和对抗性训练的多个 ML 模型。最后,进行了对抗性规避鲁棒性基准测试,以分析不同特征集的可靠性及其对模型易受对抗性示例影响的程度。通过使用具有更多数据多样性的改进数据集、选择最佳时间相关特征和更具体的特征集以及进行对抗训练,ML 模型能够实现更好的对抗鲁棒泛化。这些模型的鲁棒性得到了显著提高,对常规流量的泛化没有受到影响,误报率没有增加,也不需要过多的计算资源,从而能够可靠地检测企业计算机网络中的可疑活动和扰动流量。
期刊介绍:
Annals of Telecommunications is an international journal publishing original peer-reviewed papers in the field of telecommunications. It covers all the essential branches of modern telecommunications, ranging from digital communications to communication networks and the internet, to software, protocols and services, uses and economics. This large spectrum of topics accounts for the rapid convergence through telecommunications of the underlying technologies in computers, communications, content management towards the emergence of the information and knowledge society. As a consequence, the Journal provides a medium for exchanging research results and technological achievements accomplished by the European and international scientific community from academia and industry.