Jose Carlos Mondragon, Paula Branco, Guy-Vincent Jourdan, Andres Eduardo Gutierrez-Rodriguez, Rajesh Roshan Biswal
{"title":"Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems","authors":"Jose Carlos Mondragon, Paula Branco, Guy-Vincent Jourdan, Andres Eduardo Gutierrez-Rodriguez, Rajesh Roshan Biswal","doi":"10.1007/s10489-025-06422-4","DOIUrl":null,"url":null,"abstract":"<div><p>Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06422-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06422-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.