{"title":"Towards Resilient and Efficient Big Data Storage: Evaluating a SIEM Repository Based on HDFS","authors":"I. Saenko, Igor Kotenko","doi":"10.1109/pdp55904.2022.00051","DOIUrl":null,"url":null,"abstract":"Building an efficient, scalable, distributed storage system is challenging in a variety of industries. Currently, the most promising and efficient way to organize this method of data storage is using the Hadoop Distributed File System (HDFS). It is of interest to develop an approach to optimize the distribution of replicas across storage nodes, which allows one to provide the required resilience and efficiency of storing big data in distributed systems based on HDFS. The analysis showed that in the well-known works on processing big data, the issues of simultaneous provision of resilient and efficient data storage are practically not raised. The paper proposes an approach based on the application of the developed probabilistic models for assessing the resilience and efficiency of data storage. The models take into account the random and deterministic modes of distribution of data blocks across the network nodes and allow one to solve the task of providing resilient and efficient big data storage in three kinds of criteria: resilience maximization, efficiency maximization, and restrictions of resilience and efficiency. The objective of the experiment was to optimize the variables of the the security information and event management (SIEM) system. Experiments have confirmed the effectiveness of the proposed approach and the possibility of solving with its help the assigned tasks to satisfy the three selected species. At the same time, genetic algorithms (GAs) were used for the deterministic mode, in which some improvements were introduced regarding the construction of chromosomes and types of fitness functions.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"78 17","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Building an efficient, scalable, distributed storage system is challenging in a variety of industries. Currently, the most promising and efficient way to organize this method of data storage is using the Hadoop Distributed File System (HDFS). It is of interest to develop an approach to optimize the distribution of replicas across storage nodes, which allows one to provide the required resilience and efficiency of storing big data in distributed systems based on HDFS. The analysis showed that in the well-known works on processing big data, the issues of simultaneous provision of resilient and efficient data storage are practically not raised. The paper proposes an approach based on the application of the developed probabilistic models for assessing the resilience and efficiency of data storage. The models take into account the random and deterministic modes of distribution of data blocks across the network nodes and allow one to solve the task of providing resilient and efficient big data storage in three kinds of criteria: resilience maximization, efficiency maximization, and restrictions of resilience and efficiency. The objective of the experiment was to optimize the variables of the the security information and event management (SIEM) system. Experiments have confirmed the effectiveness of the proposed approach and the possibility of solving with its help the assigned tasks to satisfy the three selected species. At the same time, genetic algorithms (GAs) were used for the deterministic mode, in which some improvements were introduced regarding the construction of chromosomes and types of fitness functions.