Empirical evaluation of feature selection methods for machine learning based intrusion detection in IoT scenarios

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Internet of Things Pub Date : 2024-09-07 DOI:10.1016/j.iot.2024.101367

José García, Jorge Entrena, Álvaro Alesanco

{"title":"Empirical evaluation of feature selection methods for machine learning based intrusion detection in IoT scenarios","authors":"José García, Jorge Entrena, Álvaro Alesanco","doi":"10.1016/j.iot.2024.101367","DOIUrl":null,"url":null,"abstract":"<div><div>This paper delves into the critical need for enhanced security measures within the Internet of Things (IoT) landscape due to inherent vulnerabilities in IoT devices, rendering them susceptible to various forms of cyber-attacks. The study emphasizes the importance of Intrusion Detection Systems (IDS) for continuous threat monitoring. The objective of this study was to conduct a comprehensive evaluation of feature selection (FS) methods using various machine learning (ML) techniques for classifying traffic flows within datasets containing intrusions in IoT environments. An extensive benchmark analysis of ML techniques and FS methods was performed, assessing feature selection under different approaches including Filter Feature Ranking (FFR), Filter-Feature Subset Selection (FSS), and Wrapper-based Feature Selection (WFS). FS becomes pivotal in handling vast IoT data by reducing irrelevant attributes, addressing the curse of dimensionality, enhancing model interpretability, and optimizing resources in devices with limited capacity. Key findings indicate the outperformance for traffic flows classification of certain tree-based algorithms, such as J48 or PART, against other machine learning techniques (naive Bayes, multi-layer perceptron, logistic, adaptive boosting or k-Nearest Neighbors), showcasing a good balance between performance and execution time. FS methods' advantages and drawbacks are discussed, highlighting the main differences in results obtained among different FS approaches. Filter-feature Subset Selection (FSS) approaches such as CFS could be more suitable than Filter Feature Ranking (FFR), which may select correlated attributes, or than Wrapper-based Feature Selection (WFS) methods, which may tailor attribute subsets for specific ML techniques and have lengthy execution times. In any case, reducing attributes via FS has allowed optimization of classification without compromising accuracy. In this study, F1 score classification results above 0.99, along with a reduction of over 60% in the number of attributes, have been achieved in most experiments conducted across four datasets, both in binary and multiclass modes. This work emphasizes the importance of a balanced attribute selection process, taking into account threat detection capabilities and computational complexity.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"28 ","pages":"Article 101367"},"PeriodicalIF":6.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2542660524003081/pdfft?md5=2c59c06adc897db3e81bd94a83f7572e&pid=1-s2.0-S2542660524003081-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660524003081","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper delves into the critical need for enhanced security measures within the Internet of Things (IoT) landscape due to inherent vulnerabilities in IoT devices, rendering them susceptible to various forms of cyber-attacks. The study emphasizes the importance of Intrusion Detection Systems (IDS) for continuous threat monitoring. The objective of this study was to conduct a comprehensive evaluation of feature selection (FS) methods using various machine learning (ML) techniques for classifying traffic flows within datasets containing intrusions in IoT environments. An extensive benchmark analysis of ML techniques and FS methods was performed, assessing feature selection under different approaches including Filter Feature Ranking (FFR), Filter-Feature Subset Selection (FSS), and Wrapper-based Feature Selection (WFS). FS becomes pivotal in handling vast IoT data by reducing irrelevant attributes, addressing the curse of dimensionality, enhancing model interpretability, and optimizing resources in devices with limited capacity. Key findings indicate the outperformance for traffic flows classification of certain tree-based algorithms, such as J48 or PART, against other machine learning techniques (naive Bayes, multi-layer perceptron, logistic, adaptive boosting or k-Nearest Neighbors), showcasing a good balance between performance and execution time. FS methods' advantages and drawbacks are discussed, highlighting the main differences in results obtained among different FS approaches. Filter-feature Subset Selection (FSS) approaches such as CFS could be more suitable than Filter Feature Ranking (FFR), which may select correlated attributes, or than Wrapper-based Feature Selection (WFS) methods, which may tailor attribute subsets for specific ML techniques and have lengthy execution times. In any case, reducing attributes via FS has allowed optimization of classification without compromising accuracy. In this study, F1 score classification results above 0.99, along with a reduction of over 60% in the number of attributes, have been achieved in most experiments conducted across four datasets, both in binary and multiclass modes. This work emphasizes the importance of a balanced attribute selection process, taking into account threat detection capabilities and computational complexity.

查看原文本刊更多论文

物联网场景中基于机器学习的入侵检测特征选择方法的经验评估

由于物联网设备存在固有漏洞，容易受到各种形式的网络攻击，本文深入探讨了在物联网（IoT）领域加强安全措施的迫切需要。研究强调了入侵检测系统（IDS）对持续威胁监控的重要性。本研究的目的是使用各种机器学习（ML）技术对特征选择（FS）方法进行全面评估，以便对包含物联网环境中入侵的数据集中的流量进行分类。对 ML 技术和 FS 方法进行了广泛的基准分析，评估了不同方法下的特征选择，包括过滤特征排序（FFR）、过滤特征子集选择（FSS）和基于封装的特征选择（WFS）。通过减少无关属性、解决维度诅咒、增强模型的可解释性以及优化容量有限的设备资源，FS 在处理海量物联网数据时变得至关重要。主要研究结果表明，与其他机器学习技术（天真贝叶斯、多层感知器、逻辑、自适应提升或 k-近邻）相比，某些基于树的算法（如 J48 或 PART）在交通流分类方面表现更优，在性能和执行时间之间实现了良好的平衡。本文讨论了 FS 方法的优点和缺点，强调了不同 FS 方法在结果上的主要差异。过滤特征子集选择（FSS）方法（如 CFS）可能比过滤特征排序（FFR）或基于封装的特征选择（WFS）方法更适合，前者可能会选择相关的属性，后者可能会为特定的多重层析技术定制属性子集，并且执行时间较长。无论如何，通过 FS 减少属性可以在不影响准确性的情况下优化分类。在这项研究中，在四个数据集上进行的大多数实验中，无论是二分类模式还是多分类模式，F1 分数分类结果都超过了 0.99，同时属性数量减少了 60% 以上。这项工作强调了平衡属性选择过程的重要性，同时考虑到了威胁检测能力和计算复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet of Things Multiple-

CiteScore

3.60

自引率

5.10%

发文量

115

审稿时长

37 days

期刊介绍： Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.