Algorithms and methods of data clustering in the analysis of information security event logs

Digital Technology Security Pub Date : 2022-03-30 DOI:10.17212/2782-2230-2022-1-41-60

Diana N. Sidorova, Evgeniy N. Pivkin

{"title":"Algorithms and methods of data clustering in the analysis of information security event logs","authors":"Diana N. Sidorova, Evgeniy N. Pivkin","doi":"10.17212/2782-2230-2022-1-41-60","DOIUrl":null,"url":null,"abstract":"Security event log files give an idea of the state of the information system and allow you to find anomalies in user behavior and cybersecurity incidents. The existing event logs (application, system, security event logs) and their division into certain types are considered. But automated analysis of security event log data is difficult because it contains a large amount of unstructured data that has been collected from various sources. Therefore, this article presents and describes the problem of analyzing information security event logs. And to solve this problem, new and not particularly studied methods and algorithms for data clustering were considered, such as Random forest (random forest), incremental clustering, IPLoM algorithm (Iterative Partitioning Log Mining - iterative analysis of the partitioning log). The Random forest algorithm creates decision trees for data samples, after which it is provided with a forecast for each sample, and the best solution is selected by voting. This method reduces overfitting by averaging the scores. The algorithm is also used in such types of problems as regression and classification. Incremental clustering defines clusters as groups of objects that belong to the same class or concept, which is a specific set of pairs. When clusters are defined, they can overlap, allowing for a degree of \"fuzziness for samples\" that lie at the boundaries of different clusters. The IPLoM algorithm uses the unique characteristics of log messages to iteratively partition the log, which helps to extract message types efficiently.","PeriodicalId":207311,"journal":{"name":"Digital Technology Security","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Technology Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17212/2782-2230-2022-1-41-60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Security event log files give an idea of the state of the information system and allow you to find anomalies in user behavior and cybersecurity incidents. The existing event logs (application, system, security event logs) and their division into certain types are considered. But automated analysis of security event log data is difficult because it contains a large amount of unstructured data that has been collected from various sources. Therefore, this article presents and describes the problem of analyzing information security event logs. And to solve this problem, new and not particularly studied methods and algorithms for data clustering were considered, such as Random forest (random forest), incremental clustering, IPLoM algorithm (Iterative Partitioning Log Mining - iterative analysis of the partitioning log). The Random forest algorithm creates decision trees for data samples, after which it is provided with a forecast for each sample, and the best solution is selected by voting. This method reduces overfitting by averaging the scores. The algorithm is also used in such types of problems as regression and classification. Incremental clustering defines clusters as groups of objects that belong to the same class or concept, which is a specific set of pairs. When clusters are defined, they can overlap, allowing for a degree of "fuzziness for samples" that lie at the boundaries of different clusters. The IPLoM algorithm uses the unique characteristics of log messages to iteratively partition the log, which helps to extract message types efficiently.

查看原文本刊更多论文

信息安全事件日志分析中数据聚类的算法和方法

安全事件日志文件提供了信息系统状态的概念，并允许您查找用户行为和网络安全事件中的异常情况。考虑了现有的事件日志(应用程序、系统、安全事件日志)及其类型的划分。但是，安全事件日志数据的自动化分析是困难的，因为它包含从各种来源收集的大量非结构化数据。因此，本文提出并描述了分析信息安全事件日志的问题。为了解决这一问题，研究人员考虑了新的数据聚类方法和算法，如随机森林(Random forest)、增量聚类、IPLoM算法(迭代分区日志挖掘-迭代分析分区日志)。随机森林算法为数据样本创建决策树，然后为每个样本提供预测，并通过投票选出最佳解决方案。这种方法通过平均分数来减少过拟合。该算法还可用于回归和分类等问题。增量集群将集群定义为属于同一类或概念的对象组，这是一组特定的对。当集群被定义时，它们可以重叠，允许一定程度的“样本模糊”，这些样本位于不同集群的边界。IPLoM算法利用日志消息的独特特征对日志进行迭代分区，有助于高效地提取消息类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Technology Security

自引率

0.00%

发文量