Machine Learning Algorithms and Datasets for Modern IDS Design

2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom) Pub Date : 2022-06-16 DOI:10.1109/CyberneticsCom55287.2022.9865255

Inam Abdullah Abdulmajeed, I. Husien

{"title":"Machine Learning Algorithms and Datasets for Modern IDS Design","authors":"Inam Abdullah Abdulmajeed, I. Husien","doi":"10.1109/CyberneticsCom55287.2022.9865255","DOIUrl":null,"url":null,"abstract":"Intrusion Detection System (IDS) is a critical component in cyber security to capture and analyze the traffic and then differentiate between benignant and malicious traffic indicating the attack type. This review is aimed to investigate various Machine Learning (ML) algorithms utilized in IDS design; with particular focus on dataset used. The parameters used to compare the performance of each algorithm have been studied also. Dataset choice is exceptionally critical to guarantee that it is matching the IDS requirements. The dataset structure can influence in a great manner the selection of the of ML algorithm. Hence, metric will provide a numerical relation between ML algorithm against specific dataset. This review concluded that researches are liberating themselves from Supervised Learning and moving toward Clustering and other algorithms, which gives the hope that IDS in the future will be able to detect more unknown and zero-day attacks, also the percentage of utilizing hybrid algorithms has increased dramatically. On the other hand, recent ML researchers are depending more and more on modern datasets which contributes as a significant consideration in IDS design although some research articles are still seeing the KDDCup99 and its reduced variant as principal training dataset of IDSs, despite the fact that it is more than 20 years old, while cyber-threats keep rising together with adapting new technologies in the cyber world like cloud computing, IoT, and IPv6.","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Intrusion Detection System (IDS) is a critical component in cyber security to capture and analyze the traffic and then differentiate between benignant and malicious traffic indicating the attack type. This review is aimed to investigate various Machine Learning (ML) algorithms utilized in IDS design; with particular focus on dataset used. The parameters used to compare the performance of each algorithm have been studied also. Dataset choice is exceptionally critical to guarantee that it is matching the IDS requirements. The dataset structure can influence in a great manner the selection of the of ML algorithm. Hence, metric will provide a numerical relation between ML algorithm against specific dataset. This review concluded that researches are liberating themselves from Supervised Learning and moving toward Clustering and other algorithms, which gives the hope that IDS in the future will be able to detect more unknown and zero-day attacks, also the percentage of utilizing hybrid algorithms has increased dramatically. On the other hand, recent ML researchers are depending more and more on modern datasets which contributes as a significant consideration in IDS design although some research articles are still seeing the KDDCup99 and its reduced variant as principal training dataset of IDSs, despite the fact that it is more than 20 years old, while cyber-threats keep rising together with adapting new technologies in the cyber world like cloud computing, IoT, and IPv6.

查看原文本刊更多论文

现代IDS设计的机器学习算法和数据集

入侵检测系统(IDS)是网络安全的重要组成部分，它可以捕获和分析流量，从而区分良性和恶意的流量，指示攻击类型。本综述旨在研究IDS设计中使用的各种机器学习(ML)算法;特别关注使用的数据集。本文还研究了用于比较各算法性能的参数。数据集的选择是非常关键的，以确保它符合IDS需求。数据集结构对机器学习算法的选择有很大的影响。因此，度量将提供ML算法与特定数据集之间的数值关系。这篇综述的结论是，研究正在从监督学习中解放出来，转向聚类和其他算法，这给了IDS在未来能够检测到更多未知和零日攻击的希望，同时利用混合算法的比例也大大增加。另一方面，最近的机器学习研究人员越来越依赖于现代数据集，这在IDS设计中是一个重要的考虑因素，尽管一些研究文章仍然将KDDCup99及其简化版本视为IDS的主要训练数据集，尽管它已经有20多年的历史了，而网络威胁随着云计算、物联网和IPv6等网络世界中的新技术的适应而不断上升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)

自引率

0.00%

发文量