Towards Resilient and Efficient Big Data Storage: Evaluating a SIEM Repository Based on HDFS

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00051

I. Saenko, Igor Kotenko

{"title":"Towards Resilient and Efficient Big Data Storage: Evaluating a SIEM Repository Based on HDFS","authors":"I. Saenko, Igor Kotenko","doi":"10.1109/pdp55904.2022.00051","DOIUrl":null,"url":null,"abstract":"Building an efficient, scalable, distributed storage system is challenging in a variety of industries. Currently, the most promising and efficient way to organize this method of data storage is using the Hadoop Distributed File System (HDFS). It is of interest to develop an approach to optimize the distribution of replicas across storage nodes, which allows one to provide the required resilience and efficiency of storing big data in distributed systems based on HDFS. The analysis showed that in the well-known works on processing big data, the issues of simultaneous provision of resilient and efficient data storage are practically not raised. The paper proposes an approach based on the application of the developed probabilistic models for assessing the resilience and efficiency of data storage. The models take into account the random and deterministic modes of distribution of data blocks across the network nodes and allow one to solve the task of providing resilient and efficient big data storage in three kinds of criteria: resilience maximization, efficiency maximization, and restrictions of resilience and efficiency. The objective of the experiment was to optimize the variables of the the security information and event management (SIEM) system. Experiments have confirmed the effectiveness of the proposed approach and the possibility of solving with its help the assigned tasks to satisfy the three selected species. At the same time, genetic algorithms (GAs) were used for the deterministic mode, in which some improvements were introduced regarding the construction of chromosomes and types of fitness functions.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"78 17","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Building an efficient, scalable, distributed storage system is challenging in a variety of industries. Currently, the most promising and efficient way to organize this method of data storage is using the Hadoop Distributed File System (HDFS). It is of interest to develop an approach to optimize the distribution of replicas across storage nodes, which allows one to provide the required resilience and efficiency of storing big data in distributed systems based on HDFS. The analysis showed that in the well-known works on processing big data, the issues of simultaneous provision of resilient and efficient data storage are practically not raised. The paper proposes an approach based on the application of the developed probabilistic models for assessing the resilience and efficiency of data storage. The models take into account the random and deterministic modes of distribution of data blocks across the network nodes and allow one to solve the task of providing resilient and efficient big data storage in three kinds of criteria: resilience maximization, efficiency maximization, and restrictions of resilience and efficiency. The objective of the experiment was to optimize the variables of the the security information and event management (SIEM) system. Experiments have confirmed the effectiveness of the proposed approach and the possibility of solving with its help the assigned tasks to satisfy the three selected species. At the same time, genetic algorithms (GAs) were used for the deterministic mode, in which some improvements were introduced regarding the construction of chromosomes and types of fitness functions.

查看原文本刊更多论文

迈向弹性高效的大数据存储:基于HDFS的SIEM存储库评估

在许多行业中，构建一个高效、可扩展的分布式存储系统都是一个挑战。目前，最有前途和最有效的组织这种数据存储方法的方法是使用Hadoop分布式文件系统(HDFS)。开发一种方法来优化跨存储节点的副本分布是很有意义的，它允许人们在基于HDFS的分布式系统中提供存储大数据所需的弹性和效率。分析表明，在著名的大数据处理著作中，几乎没有提出同时提供弹性和高效的数据存储的问题。本文提出了一种基于概率模型的数据存储弹性和效率评估方法。该模型考虑了数据块在网络节点上分布的随机性和确定性模式，并允许在弹性最大化、效率最大化和弹性和效率限制三种标准下解决提供弹性和高效的大数据存储的任务。实验的目的是优化安全信息与事件管理(SIEM)系统的变量。实验证明了该方法的有效性，并证明了利用该方法求解给定任务以满足所选物种的可能性。同时，将遗传算法应用于确定性模式，对染色体的构造和适应度函数的类型进行了改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量