{"title":"An Algorithm-Based Fault Tolerance Strategy for the Bitonic Sort Parallel Algorithm","authors":"E. T. Camargo, E. P. Duarte","doi":"10.1109/ladc53747.2021.9672590","DOIUrl":"https://doi.org/10.1109/ladc53747.2021.9672590","url":null,"abstract":"High Performance Computing (HPC) systems are employed to solve hard problems and rely on parallel algorithms which present very long execution times - up to several days. These systems are expensive in terms of the computational resources required, including energy consumption. Thus, after failures occur it is highly desirable to loose as little of the work that has already been done as possible. In this work we present an Algorithm-Based Fault Tolerance (ABFT) strategy that can be applied to make a robust version of any hypercube-based parallel algorithm. Note that we do not assume a physical hypercube: after nodes crash, fault-free nodes autonomously adapt themselves according to a logical topology called VCube, preserving several logarithmic properties. The proposed strategy guarantees that the algorithm does not halt even after up to (N - 1) nodes crash, in a system of N nodes. We use parallel sorting as a case study, describing how to make a fault-tolerant version of the Bitonic Sort parallel algorithm. The algorithm was implemented in MPI using ULMF to handle faults. Experimental results are presented showing the performance and robustness of the proposed solution.","PeriodicalId":376642,"journal":{"name":"2021 10th Latin-American Symposium on Dependable Computing (LADC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124283527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. V. D. Merladet, Rodrigo De Melo Silveira, S. Fugivara, C. Lahoz
{"title":"Safety analysis of Brazilian suborbital launch operations based on system-theoretic approach","authors":"A. V. D. Merladet, Rodrigo De Melo Silveira, S. Fugivara, C. Lahoz","doi":"10.1109/ladc53747.2021.9672557","DOIUrl":"https://doi.org/10.1109/ladc53747.2021.9672557","url":null,"abstract":"The proposed analysis consists in identify aspects that can influence safety and mission fulfilment in Brazilian Suborbital Launch Operations through the application of System-Theoretic Process Analysis, a new hazard analysis technique capable of identifying potential hazardous design and operational flaws, including system design errors and unsafe interactions among multiple procedures and system components. This work identifies losses, hazards, system-level safety constraints, hierarchical control structure of the general system, unsafe control actions, loss scenarios that could occur and related causal factors, detecting possibilities of improvements for future launch operations of Brazilian suborbital launch vehicles by acting throughout the life cycle of the products to avoid undesired events or mitigate their consequences.","PeriodicalId":376642,"journal":{"name":"2021 10th Latin-American Symposium on Dependable Computing (LADC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127994644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahid Khan, J. Katoen, Matthias Volk, Muhammad Ahmad Zafar, Falak Sher
{"title":"Modelling and Analysis of Fire Sprinklers by Verifying Dynamic Fault Trees","authors":"Shahid Khan, J. Katoen, Matthias Volk, Muhammad Ahmad Zafar, Falak Sher","doi":"10.1109/ladc53747.2021.9672579","DOIUrl":"https://doi.org/10.1109/ladc53747.2021.9672579","url":null,"abstract":"We study the reliability analysis of fire sprinkler systems. We show that the characteristic features of Dugan's dynamic fault trees (DFTs) such as spare management, temporal ordering of failures and functional dependencies, are natural and adequate mechanisms to model various relevant phenomena in realistic fire sprinklers. For DFT analysis, we employ probabilistic model checking, an automated technique to assess reliability along with correctness. This is to date the most scalable, numerical DFT analysis technique. We show how standard reliability measures of fire sprinkler systems can be efficiently computed using the Storm model checker. In addition, we consider metrics beyond standard reliability, e.g., the probability to fail without going through a degradation phase and the worst-case reliability achieved after degradation. We illustrate our approach by fire sprinkler systems in shopping centers.","PeriodicalId":376642,"journal":{"name":"2021 10th Latin-American Symposium on Dependable Computing (LADC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130622417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}