Kelson Carvalho Santos, Rodrigo Sanches Miani, Flávio de Oliveira Silva
{"title":"Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems","authors":"Kelson Carvalho Santos, Rodrigo Sanches Miani, Flávio de Oliveira Silva","doi":"10.1007/s10922-024-09813-z","DOIUrl":null,"url":null,"abstract":"<p>The development of Intrusion Detection Systems using Machine Learning techniques (ML-based IDS) has emerged as an important research topic in the cybersecurity field. However, there is a noticeable absence of systematic studies to comprehend the usability of such systems in real-world applications. This paper analyzes the impact of data preprocessing techniques on the performance of ML-based IDS using two public datasets, UNSW-NB15 and CIC-IDS2017. Specifically, we evaluated the effects of data cleaning, encoding, and normalization techniques on the performance of binary and multiclass intrusion detection models. This work investigates the impact of data preprocessing techniques on the performance of ML-based IDS and how the performance of different ML-based IDS is affected by data preprocessing techniques. To this end, we implemented a machine learning pipeline to apply the data preprocessing techniques in different scenarios to answer such questions. The findings analyzed using the Friedman statistical test and Nemenyi post-hoc test revealed significant differences in groups of data preprocessing techniques and ML-based IDS, according to the evaluation metrics. However, these differences were not observed in multiclass scenarios for data preprocessing techniques. Additionally, ML-based IDS exhibited varying performances in binary and multiclass classifications. Therefore, our investigation presents insights into the efficacy of different data preprocessing techniques for building robust and accurate intrusion detection models.</p>","PeriodicalId":50119,"journal":{"name":"Journal of Network and Systems Management","volume":"12 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Systems Management","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10922-024-09813-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The development of Intrusion Detection Systems using Machine Learning techniques (ML-based IDS) has emerged as an important research topic in the cybersecurity field. However, there is a noticeable absence of systematic studies to comprehend the usability of such systems in real-world applications. This paper analyzes the impact of data preprocessing techniques on the performance of ML-based IDS using two public datasets, UNSW-NB15 and CIC-IDS2017. Specifically, we evaluated the effects of data cleaning, encoding, and normalization techniques on the performance of binary and multiclass intrusion detection models. This work investigates the impact of data preprocessing techniques on the performance of ML-based IDS and how the performance of different ML-based IDS is affected by data preprocessing techniques. To this end, we implemented a machine learning pipeline to apply the data preprocessing techniques in different scenarios to answer such questions. The findings analyzed using the Friedman statistical test and Nemenyi post-hoc test revealed significant differences in groups of data preprocessing techniques and ML-based IDS, according to the evaluation metrics. However, these differences were not observed in multiclass scenarios for data preprocessing techniques. Additionally, ML-based IDS exhibited varying performances in binary and multiclass classifications. Therefore, our investigation presents insights into the efficacy of different data preprocessing techniques for building robust and accurate intrusion detection models.
期刊介绍:
Journal of Network and Systems Management, features peer-reviewed original research, as well as case studies in the fields of network and system management. The journal regularly disseminates significant new information on both the telecommunications and computing aspects of these fields, as well as their evolution and emerging integration. This outstanding quarterly covers architecture, analysis, design, software, standards, and migration issues related to the operation, management, and control of distributed systems and communication networks for voice, data, video, and networked computing.