Machine Learning techniques for Behavioral Feature Selection in Network Intrusion Detection Systems

11th International Conference of Pattern Recognition Systems (ICPRS 2021) Pub Date : 1900-01-01 DOI:10.1049/icp.2021.1448

Vicente Martínez, Rodrigo Salas, Oliver Tessini, Romina Torres

{"title":"Machine Learning techniques for Behavioral Feature Selection in Network Intrusion Detection Systems","authors":"Vicente Martínez, Rodrigo Salas, Oliver Tessini, Romina Torres","doi":"10.1049/icp.2021.1448","DOIUrl":null,"url":null,"abstract":"Information systems are prone to receiving multiple types of attacks over the network. Therefore, Network Intrusion Detection Systems (NIDSs) analyze the behavior of the network traffic to detect anomalies and eventual cyberattacks. The NIDS must be able to detect these cyberattacks in an efficient and effective manner based on a set of features where it is expected that the performance depends on both the selected features and the machine learning technique used. The main goal of this work is to identify the most relevant characteristics required to detect, with a high sensitivity and precision, between normal traffic and a network intrusion, together with the most relevant features associated to the identification of a specific type of attack. In this work, a comparative study of different decision tree-based machine learning techniques combined with several feature selection techniques in order to accomplish the goal. Random Forest and the XGBoost achieved a performance that reaches up to 98.5% in the F-measure when the complete set of features were used. Results show the performance was just slightly reduced to 98% when the 10 most relevant features were used. Moreover, results also show that the model using only the 10 most relevant features was able to separately identify the type of attack with a performance of at least 90% in the F-measure. We conclude that it is possible to obtain and rank a subset of the most relevant features that characterize the intrusion pattern in the network traffic in order to support the decision of how many features to include during runtime under a real network environment.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/icp.2021.1448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Information systems are prone to receiving multiple types of attacks over the network. Therefore, Network Intrusion Detection Systems (NIDSs) analyze the behavior of the network traffic to detect anomalies and eventual cyberattacks. The NIDS must be able to detect these cyberattacks in an efficient and effective manner based on a set of features where it is expected that the performance depends on both the selected features and the machine learning technique used. The main goal of this work is to identify the most relevant characteristics required to detect, with a high sensitivity and precision, between normal traffic and a network intrusion, together with the most relevant features associated to the identification of a specific type of attack. In this work, a comparative study of different decision tree-based machine learning techniques combined with several feature selection techniques in order to accomplish the goal. Random Forest and the XGBoost achieved a performance that reaches up to 98.5% in the F-measure when the complete set of features were used. Results show the performance was just slightly reduced to 98% when the 10 most relevant features were used. Moreover, results also show that the model using only the 10 most relevant features was able to separately identify the type of attack with a performance of at least 90% in the F-measure. We conclude that it is possible to obtain and rank a subset of the most relevant features that characterize the intrusion pattern in the network traffic in order to support the decision of how many features to include during runtime under a real network environment.

查看原文本刊更多论文

网络入侵检测系统中行为特征选择的机器学习技术

信息系统在网络中容易受到多种类型的攻击。因此，网络入侵检测系统(Network Intrusion Detection system, nids)通过分析网络流量的行为来发现异常并最终发起网络攻击。NIDS必须能够基于一组特征以高效和有效的方式检测这些网络攻击，其中预期性能取决于所选特征和所使用的机器学习技术。这项工作的主要目标是确定在正常流量和网络入侵之间以高灵敏度和高精度检测所需的最相关特征，以及与识别特定类型攻击相关的最相关特征。为了实现这一目标，本文将基于决策树的不同机器学习技术与几种特征选择技术相结合进行了比较研究。当使用完整的特征集时，Random Forest和XGBoost在f度量中达到了高达98.5%的性能。结果显示，当使用10个最相关的特征时，性能略微降低到98%。此外，结果还表明，仅使用10个最相关的特征的模型能够单独识别攻击类型，在f测量中至少有90%的性能。我们得出的结论是，有可能获得表征网络流量中入侵模式的最相关特征的子集并对其进行排序，以便支持在真实网络环境下运行时包含多少特征的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

11th International Conference of Pattern Recognition Systems (ICPRS 2021)

自引率

0.00%

发文量