Feature Engineering in Machine Learning-Based Intrusion Detection Systems for OT Networks

2023 IEEE International Conference on Smart Computing (SMARTCOMP) Pub Date : 2023-06-01 DOI:10.1109/SMARTCOMP58114.2023.00086

Alex Howe, M. Papa

{"title":"Feature Engineering in Machine Learning-Based Intrusion Detection Systems for OT Networks","authors":"Alex Howe, M. Papa","doi":"10.1109/SMARTCOMP58114.2023.00086","DOIUrl":null,"url":null,"abstract":"This paper evaluates the importance of feature exploration and engineering when applying machine learning for intrusion detection in OT (Operational Technology) networks. Data used consisted of raw network traffic captures from a simulated OT environment communicating over the Modbus/TCP protocol. Feature engineering efforts identified thirty eight attributes of interest at the different layers of the network stack. The Random Forest algorithm was used to analyze the importance of each feature for the detection of anomalous network behavior. Both supervised and unsupervised learning methods were evaluated including Random Forest, Support Vector Machines, K-Nearest Neighbors, K-Means Clustering, and Isolation Forest. Results indicate that statistical based features as well as features derived from the protocol and application layers contained information best suited for detecting anomalous OT behavior. Additionally, variable importance-based feature selection helped reduce complexity and improved detection rate when compared with models trained on the original high dimensional data. Random Forest and Support Vector Machines had the best detection performance but required a large amount of labeled data for training and validation. Notably, Isolation Forest shows potential for anomaly detection in OT networks as it requires no labeled data and produced promising results.","PeriodicalId":163556,"journal":{"name":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP58114.2023.00086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper evaluates the importance of feature exploration and engineering when applying machine learning for intrusion detection in OT (Operational Technology) networks. Data used consisted of raw network traffic captures from a simulated OT environment communicating over the Modbus/TCP protocol. Feature engineering efforts identified thirty eight attributes of interest at the different layers of the network stack. The Random Forest algorithm was used to analyze the importance of each feature for the detection of anomalous network behavior. Both supervised and unsupervised learning methods were evaluated including Random Forest, Support Vector Machines, K-Nearest Neighbors, K-Means Clustering, and Isolation Forest. Results indicate that statistical based features as well as features derived from the protocol and application layers contained information best suited for detecting anomalous OT behavior. Additionally, variable importance-based feature selection helped reduce complexity and improved detection rate when compared with models trained on the original high dimensional data. Random Forest and Support Vector Machines had the best detection performance but required a large amount of labeled data for training and validation. Notably, Isolation Forest shows potential for anomaly detection in OT networks as it requires no labeled data and produced promising results.

查看原文本刊更多论文

基于机器学习的OT网络入侵检测系统特征工程

本文评估了在OT(运营技术)网络中应用机器学习进行入侵检测时特征探索和工程的重要性。使用的数据由通过Modbus/TCP协议通信的模拟OT环境捕获的原始网络流量组成。特征工程在网络堆栈的不同层确定了38个感兴趣的属性。采用随机森林算法分析各特征对异常网络行为检测的重要性。评估了有监督和无监督学习方法，包括随机森林、支持向量机、k近邻、k均值聚类和隔离森林。结果表明，基于统计的特征以及来自协议层和应用层的特征包含最适合检测异常OT行为的信息。此外，与在原始高维数据上训练的模型相比，基于变量重要度的特征选择有助于降低复杂性并提高检测率。随机森林和支持向量机具有最好的检测性能，但需要大量的标记数据进行训练和验证。值得注意的是，隔离森林显示了在OT网络中进行异常检测的潜力，因为它不需要标记数据，并产生了有希望的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Smart Computing (SMARTCOMP)

自引率

0.00%

发文量