用深度学习检测日志事件序列中的异常:开放的研究挑战

Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference Pub Date : 2023-06-14 DOI:10.1145/3590777.3590789

Patrick Himler, Max Landauer, Florian Skopik, Markus Wurzenberger

{"title":"用深度学习检测日志事件序列中的异常:开放的研究挑战","authors":"Patrick Himler, Max Landauer, Florian Skopik, Markus Wurzenberger","doi":"10.1145/3590777.3590789","DOIUrl":null,"url":null,"abstract":"Anomaly Detection (AD) is an important area to reliably detect malicious behavior and attacks on computer systems. Log data is a rich source of information about systems and thus provides a suitable input for AD. With the sheer amount of log data available today, Machine Learning (ML) and its further development Deep Learning (DL) have been applied for years to create models for AD. Especially when processing complex log data, DL is often able to achieve better performance than ML. To detect anomalous patterns that span over multiple log lines, it is necessary to group these log lines into log-event sequences. This work uses a Long Short-Term Memory (LSTM) model for AD which is one of the most important approaches to represent long-range temporal dependencies in log-event sequences of arbitrary length. This means that we use past information to predict whether future events are normal or anomalous. For the LSTM model we adapt a state of the art open source implementation called LogDeep. For the evaluation, we use a Hadoop Distributed File System (HDFS) data set, which is well studied in current research, and an open source Audit data set provided by the Austrian Institute of Technology (AIT). In this paper we show that without padding, a common preprocessing step used that strongly influences the AD process and artificially improves detection results and thus accuracy in lab testing, it is not possible to achieve the same high quality of results shown in literature. Furthermore, we analyze limitations of DL approaches applied for AD and list future research priorities and design challenges.","PeriodicalId":231403,"journal":{"name":"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference","volume":"17 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards Detecting Anomalies in Log-Event Sequences with Deep Learning: Open Research Challenges\",\"authors\":\"Patrick Himler, Max Landauer, Florian Skopik, Markus Wurzenberger\",\"doi\":\"10.1145/3590777.3590789\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly Detection (AD) is an important area to reliably detect malicious behavior and attacks on computer systems. Log data is a rich source of information about systems and thus provides a suitable input for AD. With the sheer amount of log data available today, Machine Learning (ML) and its further development Deep Learning (DL) have been applied for years to create models for AD. Especially when processing complex log data, DL is often able to achieve better performance than ML. To detect anomalous patterns that span over multiple log lines, it is necessary to group these log lines into log-event sequences. This work uses a Long Short-Term Memory (LSTM) model for AD which is one of the most important approaches to represent long-range temporal dependencies in log-event sequences of arbitrary length. This means that we use past information to predict whether future events are normal or anomalous. For the LSTM model we adapt a state of the art open source implementation called LogDeep. For the evaluation, we use a Hadoop Distributed File System (HDFS) data set, which is well studied in current research, and an open source Audit data set provided by the Austrian Institute of Technology (AIT). In this paper we show that without padding, a common preprocessing step used that strongly influences the AD process and artificially improves detection results and thus accuracy in lab testing, it is not possible to achieve the same high quality of results shown in literature. Furthermore, we analyze limitations of DL approaches applied for AD and list future research priorities and design challenges.\",\"PeriodicalId\":231403,\"journal\":{\"name\":\"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference\",\"volume\":\"17 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3590777.3590789\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590777.3590789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

异常检测(AD)是可靠检测计算机系统恶意行为和攻击的一个重要领域。日志数据是关于系统的丰富信息源，因此为AD提供了合适的输入。随着如今大量的日志数据的出现，机器学习(ML)及其进一步发展深度学习(DL)多年来一直被应用于创建AD模型。特别是在处理复杂的日志数据时，深度学习通常能够获得比ML更好的性能。为了检测跨越多条日志线的异常模式，有必要将这些日志线分组为日志事件序列。这项工作使用了AD的长短期记忆(LSTM)模型，这是在任意长度的日志事件序列中表示长期时间依赖性的最重要方法之一。这意味着我们使用过去的信息来预测未来的事件是正常的还是异常的。对于LSTM模型，我们采用了最先进的开源实现，称为LogDeep。为了评估，我们使用了Hadoop分布式文件系统(HDFS)数据集，该数据集在当前的研究中得到了很好的研究，以及由奥地利技术学院(AIT)提供的开源审计数据集。在本文中，我们表明，如果没有填充，这是一种常见的预处理步骤，它会强烈影响AD过程，人为地提高检测结果，从而提高实验室测试的准确性，就不可能获得与文献中显示的相同的高质量结果。此外，我们分析了应用于AD的DL方法的局限性，并列出了未来的研究重点和设计挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Detecting Anomalies in Log-Event Sequences with Deep Learning: Open Research Challenges

Anomaly Detection (AD) is an important area to reliably detect malicious behavior and attacks on computer systems. Log data is a rich source of information about systems and thus provides a suitable input for AD. With the sheer amount of log data available today, Machine Learning (ML) and its further development Deep Learning (DL) have been applied for years to create models for AD. Especially when processing complex log data, DL is often able to achieve better performance than ML. To detect anomalous patterns that span over multiple log lines, it is necessary to group these log lines into log-event sequences. This work uses a Long Short-Term Memory (LSTM) model for AD which is one of the most important approaches to represent long-range temporal dependencies in log-event sequences of arbitrary length. This means that we use past information to predict whether future events are normal or anomalous. For the LSTM model we adapt a state of the art open source implementation called LogDeep. For the evaluation, we use a Hadoop Distributed File System (HDFS) data set, which is well studied in current research, and an open source Audit data set provided by the Austrian Institute of Technology (AIT). In this paper we show that without padding, a common preprocessing step used that strongly influences the AD process and artificially improves detection results and thus accuracy in lab testing, it is not possible to achieve the same high quality of results shown in literature. Furthermore, we analyze limitations of DL approaches applied for AD and list future research priorities and design challenges.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference

自引率

0.00%

发文量