Feature Engineering in Machine Learning-Based Intrusion Detection Systems for OT Networks

Alex Howe, M. Papa
{"title":"Feature Engineering in Machine Learning-Based Intrusion Detection Systems for OT Networks","authors":"Alex Howe, M. Papa","doi":"10.1109/SMARTCOMP58114.2023.00086","DOIUrl":null,"url":null,"abstract":"This paper evaluates the importance of feature exploration and engineering when applying machine learning for intrusion detection in OT (Operational Technology) networks. Data used consisted of raw network traffic captures from a simulated OT environment communicating over the Modbus/TCP protocol. Feature engineering efforts identified thirty eight attributes of interest at the different layers of the network stack. The Random Forest algorithm was used to analyze the importance of each feature for the detection of anomalous network behavior. Both supervised and unsupervised learning methods were evaluated including Random Forest, Support Vector Machines, K-Nearest Neighbors, K-Means Clustering, and Isolation Forest. Results indicate that statistical based features as well as features derived from the protocol and application layers contained information best suited for detecting anomalous OT behavior. Additionally, variable importance-based feature selection helped reduce complexity and improved detection rate when compared with models trained on the original high dimensional data. Random Forest and Support Vector Machines had the best detection performance but required a large amount of labeled data for training and validation. Notably, Isolation Forest shows potential for anomaly detection in OT networks as it requires no labeled data and produced promising results.","PeriodicalId":163556,"journal":{"name":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP58114.2023.00086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper evaluates the importance of feature exploration and engineering when applying machine learning for intrusion detection in OT (Operational Technology) networks. Data used consisted of raw network traffic captures from a simulated OT environment communicating over the Modbus/TCP protocol. Feature engineering efforts identified thirty eight attributes of interest at the different layers of the network stack. The Random Forest algorithm was used to analyze the importance of each feature for the detection of anomalous network behavior. Both supervised and unsupervised learning methods were evaluated including Random Forest, Support Vector Machines, K-Nearest Neighbors, K-Means Clustering, and Isolation Forest. Results indicate that statistical based features as well as features derived from the protocol and application layers contained information best suited for detecting anomalous OT behavior. Additionally, variable importance-based feature selection helped reduce complexity and improved detection rate when compared with models trained on the original high dimensional data. Random Forest and Support Vector Machines had the best detection performance but required a large amount of labeled data for training and validation. Notably, Isolation Forest shows potential for anomaly detection in OT networks as it requires no labeled data and produced promising results.
基于机器学习的OT网络入侵检测系统特征工程
本文评估了在OT(运营技术)网络中应用机器学习进行入侵检测时特征探索和工程的重要性。使用的数据由通过Modbus/TCP协议通信的模拟OT环境捕获的原始网络流量组成。特征工程在网络堆栈的不同层确定了38个感兴趣的属性。采用随机森林算法分析各特征对异常网络行为检测的重要性。评估了有监督和无监督学习方法,包括随机森林、支持向量机、k近邻、k均值聚类和隔离森林。结果表明,基于统计的特征以及来自协议层和应用层的特征包含最适合检测异常OT行为的信息。此外,与在原始高维数据上训练的模型相比,基于变量重要度的特征选择有助于降低复杂性并提高检测率。随机森林和支持向量机具有最好的检测性能,但需要大量的标记数据进行训练和验证。值得注意的是,隔离森林显示了在OT网络中进行异常检测的潜力,因为它不需要标记数据,并产生了有希望的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信