{"title":"Bi-LSTM: Finding Network Anomaly Based on Feature Grouping Clustering","authors":"Mengbo Xiong, Hui-ya Ma, Zhou Fang, Dong Wang, Qiuyun Wang, Xuren Wang","doi":"10.1145/3426826.3426843","DOIUrl":null,"url":null,"abstract":"Intrusion detection is one of the key technologies to ensure the security of cyberspace. In this paper, a detection model of Bi-LSTM, whose powerful serialization modeling function can discover the time series characteristics from network data, combined with machine learning algorithm K-means is proposed. We know that the data collected by network sensor or audit log has many attributes. In order to achieve a successful classification with low computational cost, it is important to employing the most relevant and discriminating features. How to extract useful information from those attributes to improve detection rate and reduce false detection are challenging. First, we group attributes according to the conditions on which they are collected or more generally, evenly. Then we cluster attributes of each group with K-means. So, we got the same number of hyper-features as the number of the groups. On the one side data reduction is significant and the data volume was greatly declined up to 85%. On the other side, the extracted features, also called hyper features, are more concentrated and informative than the low-level attributes. Detection rate on the high-level features is better than that on original attributes, both with traditional machine learning classification of C4.5 or our hybrid model. The intrusion detection rate of the powerful serialization model, Bi-LSTM based on K-means, is as high as 99.93%, the accuracy rate as high as 98.84%, and the false detection rate is 0. Moreover, experiments show that our Bi-LSTM model plus K-means works well with new attacks only appeared in test data too, which is meaningful for intrusion detection.","PeriodicalId":202857,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426826.3426843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Intrusion detection is one of the key technologies to ensure the security of cyberspace. In this paper, a detection model of Bi-LSTM, whose powerful serialization modeling function can discover the time series characteristics from network data, combined with machine learning algorithm K-means is proposed. We know that the data collected by network sensor or audit log has many attributes. In order to achieve a successful classification with low computational cost, it is important to employing the most relevant and discriminating features. How to extract useful information from those attributes to improve detection rate and reduce false detection are challenging. First, we group attributes according to the conditions on which they are collected or more generally, evenly. Then we cluster attributes of each group with K-means. So, we got the same number of hyper-features as the number of the groups. On the one side data reduction is significant and the data volume was greatly declined up to 85%. On the other side, the extracted features, also called hyper features, are more concentrated and informative than the low-level attributes. Detection rate on the high-level features is better than that on original attributes, both with traditional machine learning classification of C4.5 or our hybrid model. The intrusion detection rate of the powerful serialization model, Bi-LSTM based on K-means, is as high as 99.93%, the accuracy rate as high as 98.84%, and the false detection rate is 0. Moreover, experiments show that our Bi-LSTM model plus K-means works well with new attacks only appeared in test data too, which is meaningful for intrusion detection.