Bi-LSTM: Finding Network Anomaly Based on Feature Grouping Clustering

Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence Pub Date : 2020-09-18 DOI:10.1145/3426826.3426843

Mengbo Xiong, Hui-ya Ma, Zhou Fang, Dong Wang, Qiuyun Wang, Xuren Wang

{"title":"Bi-LSTM: Finding Network Anomaly Based on Feature Grouping Clustering","authors":"Mengbo Xiong, Hui-ya Ma, Zhou Fang, Dong Wang, Qiuyun Wang, Xuren Wang","doi":"10.1145/3426826.3426843","DOIUrl":null,"url":null,"abstract":"Intrusion detection is one of the key technologies to ensure the security of cyberspace. In this paper, a detection model of Bi-LSTM, whose powerful serialization modeling function can discover the time series characteristics from network data, combined with machine learning algorithm K-means is proposed. We know that the data collected by network sensor or audit log has many attributes. In order to achieve a successful classification with low computational cost, it is important to employing the most relevant and discriminating features. How to extract useful information from those attributes to improve detection rate and reduce false detection are challenging. First, we group attributes according to the conditions on which they are collected or more generally, evenly. Then we cluster attributes of each group with K-means. So, we got the same number of hyper-features as the number of the groups. On the one side data reduction is significant and the data volume was greatly declined up to 85%. On the other side, the extracted features, also called hyper features, are more concentrated and informative than the low-level attributes. Detection rate on the high-level features is better than that on original attributes, both with traditional machine learning classification of C4.5 or our hybrid model. The intrusion detection rate of the powerful serialization model, Bi-LSTM based on K-means, is as high as 99.93%, the accuracy rate as high as 98.84%, and the false detection rate is 0. Moreover, experiments show that our Bi-LSTM model plus K-means works well with new attacks only appeared in test data too, which is meaningful for intrusion detection.","PeriodicalId":202857,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426826.3426843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Intrusion detection is one of the key technologies to ensure the security of cyberspace. In this paper, a detection model of Bi-LSTM, whose powerful serialization modeling function can discover the time series characteristics from network data, combined with machine learning algorithm K-means is proposed. We know that the data collected by network sensor or audit log has many attributes. In order to achieve a successful classification with low computational cost, it is important to employing the most relevant and discriminating features. How to extract useful information from those attributes to improve detection rate and reduce false detection are challenging. First, we group attributes according to the conditions on which they are collected or more generally, evenly. Then we cluster attributes of each group with K-means. So, we got the same number of hyper-features as the number of the groups. On the one side data reduction is significant and the data volume was greatly declined up to 85%. On the other side, the extracted features, also called hyper features, are more concentrated and informative than the low-level attributes. Detection rate on the high-level features is better than that on original attributes, both with traditional machine learning classification of C4.5 or our hybrid model. The intrusion detection rate of the powerful serialization model, Bi-LSTM based on K-means, is as high as 99.93%, the accuracy rate as high as 98.84%, and the false detection rate is 0. Moreover, experiments show that our Bi-LSTM model plus K-means works well with new attacks only appeared in test data too, which is meaningful for intrusion detection.

查看原文本刊更多论文

基于特征分组聚类的网络异常发现

入侵检测是保障网络空间安全的关键技术之一。本文结合机器学习算法K-means，提出了一种Bi-LSTM检测模型，该模型具有强大的序列化建模功能，可以从网络数据中发现时间序列特征。我们知道，网络传感器或审计日志收集的数据具有许多属性。为了以较低的计算成本实现成功的分类，重要的是使用最相关和最具区别性的特征。如何从这些属性中提取有用信息以提高检测率和减少误检是一个具有挑战性的问题。首先，我们根据收集属性的条件或更一般地，均匀地对属性进行分组。然后用K-means对每组属性进行聚类。因此，我们得到的超级功能的数量与组的数量相同。一方面数据减少明显，数据量大幅下降，降幅达85%。另一方面，提取的特征(也称为超特征)比低级属性更集中，信息更丰富。无论是传统的C4.5机器学习分类还是我们的混合模型，对高级特征的检测率都优于对原始属性的检测率。基于K-means的强大序列化模型Bi-LSTM的入侵检测率高达99.93%，准确率高达98.84%，误检率为0。此外，实验表明，我们的Bi-LSTM模型加上K-means对于只在测试数据中出现的新攻击也能很好地处理，这对入侵检测具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence

自引率

0.00%

发文量