Anomaly detection based on system text logs of virtual network functions

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research Pub Date : 2024-08-02 DOI:10.1016/j.bdr.2024.100485

Daniela N. Rim , DongNyeong Heo , Chungjun Lee , Sukhyun Nam , Jae-Hyoung Yoo , James Won-Ki Hong , Heeyoul Choi

{"title":"Anomaly detection based on system text logs of virtual network functions","authors":"Daniela N. Rim , DongNyeong Heo , Chungjun Lee , Sukhyun Nam , Jae-Hyoung Yoo , James Won-Ki Hong , Heeyoul Choi","doi":"10.1016/j.bdr.2024.100485","DOIUrl":null,"url":null,"abstract":"<div><p>In virtual network environments building secure and effective systems is crucial for its correct functioning, and so the anomaly detection task is at its core. To uncover and predict abnormalities in the behavior of a virtual machine, it is desirable to extract relevant information from system text logs. The main issue is that text is unstructured and symbolic data, and also very expensive to process. However, recent advances in deep learning have shown remarkable capabilities of handling such data. In this work, we propose using a simple LSTM recurrent network on top of a pre-trained Sentence-BERT, which encodes the system logs into fixed-length vectors. We trained the model in an unsupervised fashion to learn the likelihood of the represented sequences of logs. This way, the model can trigger a warning with an accuracy of 81% when a virtual machine generates an abnormal sequence. Our model approach is not only easy to train and computationally cheap, it also generalizes to the content of any input.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100485"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000601","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In virtual network environments building secure and effective systems is crucial for its correct functioning, and so the anomaly detection task is at its core. To uncover and predict abnormalities in the behavior of a virtual machine, it is desirable to extract relevant information from system text logs. The main issue is that text is unstructured and symbolic data, and also very expensive to process. However, recent advances in deep learning have shown remarkable capabilities of handling such data. In this work, we propose using a simple LSTM recurrent network on top of a pre-trained Sentence-BERT, which encodes the system logs into fixed-length vectors. We trained the model in an unsupervised fashion to learn the likelihood of the represented sequences of logs. This way, the model can trigger a warning with an accuracy of 81% when a virtual machine generates an abnormal sequence. Our model approach is not only easy to train and computationally cheap, it also generalizes to the content of any input.

查看原文本刊更多论文

基于虚拟网络功能系统文本日志的异常检测

在虚拟网络环境中，建立安全有效的系统对于系统的正常运行至关重要，因此异常检测任务是其核心。要发现和预测虚拟机行为的异常，最好是从系统文本日志中提取相关信息。主要问题在于，文本是非结构化的符号数据，处理起来也非常昂贵。然而，深度学习的最新进展已经显示出处理此类数据的卓越能力。在这项工作中，我们建议在预先训练好的 Sentence-BERT 基础上使用简单的 LSTM 循环网络，将系统日志编码为固定长度的向量。我们以无监督方式训练该模型，以学习所代表日志序列的可能性。这样，当虚拟机产生异常序列时，该模型能以 81% 的准确率触发警告。我们的模型方法不仅易于训练且计算成本低廉，还能泛化到任何输入内容。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.