基于在线k-均值聚类的入侵检测实证研究

Remah Younisse, Q. A. Al-Haija
{"title":"基于在线k-均值聚类的入侵检测实证研究","authors":"Remah Younisse, Q. A. Al-Haija","doi":"10.1109/SmartNets58706.2023.10215737","DOIUrl":null,"url":null,"abstract":"K-means clustering is widely used in data mining applications. The k-means algorithm is built to pass over the data to be classified in multiple iterations assuming that the whole data is reachable in every iteration. While the pleasure of having the complete data at a time is not available for online data, the online versions of the k-means clustering have to be used when needed. Online data is a notable pattern extensively used in cybersecurity applications such as intrusion detection systems (IDS). In this work, we develop an unsupervised learning-based IDS using an online k-means clustering algorithm. We also measure the IDS efficiency of clustering highly unbalanced online data generated from an IoT network environment attacked by diverse intrusions and using various cluster centers. Besides, the evaluation process was performed for raw (unnormalized) and normalized data records. The performance of online k-means clustering was compared to offline k-means clustering. The results showed that the online clustering method could operate adequately as the offline k-means clustering, especially when used with normalized data traffic scoring an overall clustering purity of 99% for normal packets and 93% for anomaly packets. Besides, the model peaked at an overall F1 score of 99% for normal packet prediction and 94% for anomaly packet prediction.","PeriodicalId":301834,"journal":{"name":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An empirical study on utilizing online k-means clustering for intrusion detection purposes\",\"authors\":\"Remah Younisse, Q. A. Al-Haija\",\"doi\":\"10.1109/SmartNets58706.2023.10215737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-means clustering is widely used in data mining applications. The k-means algorithm is built to pass over the data to be classified in multiple iterations assuming that the whole data is reachable in every iteration. While the pleasure of having the complete data at a time is not available for online data, the online versions of the k-means clustering have to be used when needed. Online data is a notable pattern extensively used in cybersecurity applications such as intrusion detection systems (IDS). In this work, we develop an unsupervised learning-based IDS using an online k-means clustering algorithm. We also measure the IDS efficiency of clustering highly unbalanced online data generated from an IoT network environment attacked by diverse intrusions and using various cluster centers. Besides, the evaluation process was performed for raw (unnormalized) and normalized data records. The performance of online k-means clustering was compared to offline k-means clustering. The results showed that the online clustering method could operate adequately as the offline k-means clustering, especially when used with normalized data traffic scoring an overall clustering purity of 99% for normal packets and 93% for anomaly packets. Besides, the model peaked at an overall F1 score of 99% for normal packet prediction and 94% for anomaly packet prediction.\",\"PeriodicalId\":301834,\"journal\":{\"name\":\"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SmartNets58706.2023.10215737\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SmartNets58706.2023.10215737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

k -均值聚类广泛应用于数据挖掘应用。k-means算法的建立是为了在多次迭代中传递待分类的数据,假设每次迭代都可以到达整个数据。虽然在线数据无法一次获得完整数据的乐趣,但在需要时必须使用k-means聚类的在线版本。在线数据是入侵检测系统(IDS)等网络安全应用中广泛使用的一种引人注目的模式。在这项工作中,我们使用在线k-means聚类算法开发了一个基于无监督学习的IDS。我们还测量了由不同入侵攻击的物联网网络环境和使用不同集群中心生成的高度不平衡在线数据的集群化IDS效率。此外,对原始(未规范化)和规范化的数据记录进行了评估过程。比较了在线k-means聚类与离线k-means聚类的性能。结果表明,在线聚类方法可以充分发挥离线k-means聚类的作用,特别是当用于规范化数据流量时,正常数据包的总体聚类纯度为99%,异常数据包的总体聚类纯度为93%。此外,该模型在正常包预测和异常包预测方面的F1总分分别达到99%和94%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An empirical study on utilizing online k-means clustering for intrusion detection purposes
K-means clustering is widely used in data mining applications. The k-means algorithm is built to pass over the data to be classified in multiple iterations assuming that the whole data is reachable in every iteration. While the pleasure of having the complete data at a time is not available for online data, the online versions of the k-means clustering have to be used when needed. Online data is a notable pattern extensively used in cybersecurity applications such as intrusion detection systems (IDS). In this work, we develop an unsupervised learning-based IDS using an online k-means clustering algorithm. We also measure the IDS efficiency of clustering highly unbalanced online data generated from an IoT network environment attacked by diverse intrusions and using various cluster centers. Besides, the evaluation process was performed for raw (unnormalized) and normalized data records. The performance of online k-means clustering was compared to offline k-means clustering. The results showed that the online clustering method could operate adequately as the offline k-means clustering, especially when used with normalized data traffic scoring an overall clustering purity of 99% for normal packets and 93% for anomaly packets. Besides, the model peaked at an overall F1 score of 99% for normal packet prediction and 94% for anomaly packet prediction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信