{"title":"基于在线k-均值聚类的入侵检测实证研究","authors":"Remah Younisse, Q. A. Al-Haija","doi":"10.1109/SmartNets58706.2023.10215737","DOIUrl":null,"url":null,"abstract":"K-means clustering is widely used in data mining applications. The k-means algorithm is built to pass over the data to be classified in multiple iterations assuming that the whole data is reachable in every iteration. While the pleasure of having the complete data at a time is not available for online data, the online versions of the k-means clustering have to be used when needed. Online data is a notable pattern extensively used in cybersecurity applications such as intrusion detection systems (IDS). In this work, we develop an unsupervised learning-based IDS using an online k-means clustering algorithm. We also measure the IDS efficiency of clustering highly unbalanced online data generated from an IoT network environment attacked by diverse intrusions and using various cluster centers. Besides, the evaluation process was performed for raw (unnormalized) and normalized data records. The performance of online k-means clustering was compared to offline k-means clustering. The results showed that the online clustering method could operate adequately as the offline k-means clustering, especially when used with normalized data traffic scoring an overall clustering purity of 99% for normal packets and 93% for anomaly packets. Besides, the model peaked at an overall F1 score of 99% for normal packet prediction and 94% for anomaly packet prediction.","PeriodicalId":301834,"journal":{"name":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An empirical study on utilizing online k-means clustering for intrusion detection purposes\",\"authors\":\"Remah Younisse, Q. A. Al-Haija\",\"doi\":\"10.1109/SmartNets58706.2023.10215737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-means clustering is widely used in data mining applications. The k-means algorithm is built to pass over the data to be classified in multiple iterations assuming that the whole data is reachable in every iteration. While the pleasure of having the complete data at a time is not available for online data, the online versions of the k-means clustering have to be used when needed. Online data is a notable pattern extensively used in cybersecurity applications such as intrusion detection systems (IDS). In this work, we develop an unsupervised learning-based IDS using an online k-means clustering algorithm. We also measure the IDS efficiency of clustering highly unbalanced online data generated from an IoT network environment attacked by diverse intrusions and using various cluster centers. Besides, the evaluation process was performed for raw (unnormalized) and normalized data records. The performance of online k-means clustering was compared to offline k-means clustering. The results showed that the online clustering method could operate adequately as the offline k-means clustering, especially when used with normalized data traffic scoring an overall clustering purity of 99% for normal packets and 93% for anomaly packets. Besides, the model peaked at an overall F1 score of 99% for normal packet prediction and 94% for anomaly packet prediction.\",\"PeriodicalId\":301834,\"journal\":{\"name\":\"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SmartNets58706.2023.10215737\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SmartNets58706.2023.10215737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
k -均值聚类广泛应用于数据挖掘应用。k-means算法的建立是为了在多次迭代中传递待分类的数据,假设每次迭代都可以到达整个数据。虽然在线数据无法一次获得完整数据的乐趣,但在需要时必须使用k-means聚类的在线版本。在线数据是入侵检测系统(IDS)等网络安全应用中广泛使用的一种引人注目的模式。在这项工作中,我们使用在线k-means聚类算法开发了一个基于无监督学习的IDS。我们还测量了由不同入侵攻击的物联网网络环境和使用不同集群中心生成的高度不平衡在线数据的集群化IDS效率。此外,对原始(未规范化)和规范化的数据记录进行了评估过程。比较了在线k-means聚类与离线k-means聚类的性能。结果表明,在线聚类方法可以充分发挥离线k-means聚类的作用,特别是当用于规范化数据流量时,正常数据包的总体聚类纯度为99%,异常数据包的总体聚类纯度为93%。此外,该模型在正常包预测和异常包预测方面的F1总分分别达到99%和94%。
An empirical study on utilizing online k-means clustering for intrusion detection purposes
K-means clustering is widely used in data mining applications. The k-means algorithm is built to pass over the data to be classified in multiple iterations assuming that the whole data is reachable in every iteration. While the pleasure of having the complete data at a time is not available for online data, the online versions of the k-means clustering have to be used when needed. Online data is a notable pattern extensively used in cybersecurity applications such as intrusion detection systems (IDS). In this work, we develop an unsupervised learning-based IDS using an online k-means clustering algorithm. We also measure the IDS efficiency of clustering highly unbalanced online data generated from an IoT network environment attacked by diverse intrusions and using various cluster centers. Besides, the evaluation process was performed for raw (unnormalized) and normalized data records. The performance of online k-means clustering was compared to offline k-means clustering. The results showed that the online clustering method could operate adequately as the offline k-means clustering, especially when used with normalized data traffic scoring an overall clustering purity of 99% for normal packets and 93% for anomaly packets. Besides, the model peaked at an overall F1 score of 99% for normal packet prediction and 94% for anomaly packet prediction.