Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jose Carlos Mondragon, Paula Branco, Guy-Vincent Jourdan, Andres Eduardo Gutierrez-Rodriguez, Rajesh Roshan Biswal
{"title":"Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems","authors":"Jose Carlos Mondragon,&nbsp;Paula Branco,&nbsp;Guy-Vincent Jourdan,&nbsp;Andres Eduardo Gutierrez-Rodriguez,&nbsp;Rajesh Roshan Biswal","doi":"10.1007/s10489-025-06422-4","DOIUrl":null,"url":null,"abstract":"<div><p>Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06422-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06422-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.

在全球范围内,网络攻击每月都在增长和变异。为应对这些威胁,开发了智能入侵网络检测系统来分析和检测异常流量。解决这一问题的一种方法是使用网络流(设备间通信的汇总版本)。网络流数据集用于训练人工智能(AI)模型,以对特定攻击进行分类。训练这些模型需要通常在实验室中合成的威胁样本,因为在运行网络中捕获这些样本是一项具有挑战性的任务。随着威胁的快速发展,新的网络流也在不断开发和共享。然而,在测试模型时,使用旧数据集仍然是一种流行的程序,这阻碍了对新攻击中最新解决方案的优势和机遇进行更全面的描述。此外,标准化基准的缺失也会导致算法所生成的模型之间无法进行很好的比较。为了弥补这些不足,我们提出了一个包含 14 个最新预处理数据集的基准,并研究了基于网络流的七类网络入侵检测算法。我们为研究人员提供了一个集中的预处理数据集源,便于下载。所有数据集还提供了训练、验证和测试分区,以便对现有解决方案和新解决方案进行直接、公平的比较。我们选择了最先进的公开算法,它们是各种方法的代表。我们使用这些算法的 Macro F1 分数进行了实验比较。我们的结果突出了每个模型在数据集场景下的运行情况,并为有竞争力的解决方案提供了指导。最后,我们讨论了模型和基准的主要特点,重点是对从业人员和研究人员的实际影响和建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信