Karim Boubouh, Amine Boussetta, Nirupam Gupta, Alexandre Maurer, Rafael Pinot
{"title":"Democratizing Machine Learning: Resilient Distributed Learning with Heterogeneous Participants","authors":"Karim Boubouh, Amine Boussetta, Nirupam Gupta, Alexandre Maurer, Rafael Pinot","doi":"10.1109/SRDS55811.2022.00019","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00019","url":null,"abstract":"The increasing prevalence of personal devices motivates the design of algorithms that can leverage their computing power, together with the data they generate, in order to build privacy-preserving and effective machine learning models. However, traditional distributed learning algorithms impose a uniform workload on all participating devices, most often discarding the weakest participants. This not only induces a suboptimal use of available computational resources, but also significantly reduces the quality of the learning process, as data held by the slowest devices is discarded from the procedure. This paper proposes HgO, a distributed learning scheme with parameterizable iteration costs that can be adjusted to the computational capabilities of different devices. HgO encourages the participation of slower devices, thereby improving the accuracy of the model when the participants do not share the same dataset. When combined with a robust aggregation rule, HgO can tolerate some level of Byzantine behavior, depending on the hardware profile of the devices (we prove, for the first time, a trade-off between Byzantine tolerance and hardware heterogeneity). We also demonstrate the convergence of HgO, theoretically and empirically, without assuming any specific partitioning of the data over the devices. We present an exhaustive set of experiments, evaluating the performance of HgO on several classification tasks and highlighting the importance of incorporating slow devices when learning in a Byzantine-prone environment with heterogeneous participants.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134021066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaosong Gu, Wei Cao, Yicong Zhu, Xuan Song, Yu Huang, Xiaoxing Ma
{"title":"Compositional Model Checking of Consensus Protocols via Interaction-Preserving Abstraction","authors":"Xiaosong Gu, Wei Cao, Yicong Zhu, Xuan Song, Yu Huang, Xiaoxing Ma","doi":"10.1109/SRDS55811.2022.00018","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00018","url":null,"abstract":"Consensus protocols are widely used in building reliable distributed software systems and their correctness is of vital importance. TLA+ is a lightweight formal specification language which enables precise specification of system design and exhaustive checking of the design without any human effort. The features of TLA+ make it widely used in the specification and model checking of consensus protocols, both in academia and in industry. However, the application of TLA+ is limited by the state explosion problem in model checking. Though compositional model checking is essential to tame the state explosion problem, existing compositional checking techniques do not sufficiently consider the characteristics of TLA+. In this work, we propose the Interaction-Preserving Abstraction (IPA) framework, which leverages the features of TLA+ and enables practical and efficient compositional model checking of consensus protocols specified in TLA+. In the IPA framework, system specification is partitioned into multiple modules, and each module is divided into the internal part and the interaction part. The basic idea of the interaction-preserving abstraction is to omit the internal part of each module, such that another module cannot distinguish whether it is interacting with the original module or the coarsened abstract one. We apply the IPA framework to the compositional checking of the TLA+ specifications of two consensus protocols Raft and ParallelRaft. Raft is a consensus protocol which was originally developed in academia and then widely used in industry. ParallelRaft is the replication protocol in PolarFS, the distributed file system for the commercial database Alibaba PolarDB. We demonstrate that the IPA framework is easy to use in realistic scenarios and at the same time significantly reduces the model checking cost.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117129130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lélia Blin, C. Johnen, Gabriel Le Bouder, F. Petit
{"title":"Silent Anonymous Snap-Stabilizing Termination Detection","authors":"Lélia Blin, C. Johnen, Gabriel Le Bouder, F. Petit","doi":"10.1109/SRDS55811.2022.00023","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00023","url":null,"abstract":"We address the problem of Termination Detection (TD) in asynchronous networks. It is known that TD cannot be achieved in the context of self-stabilization, except in the specific case where the TD algorithm is snap-stabilizing, i.e., it always behaves according to its specification regardless of the initial configuration. In this paper, we propose a generic, deterministic, snap-stabilizing, silent algorithm that detects whether an observed terminating silent self-stabilizing algorithm, A, has converged to a configuration that satisfies an intended predicate. Our algorithm assumes that nodes know (an upper bound on) the network diameter D. However, it requires no underlying structure, nor specific topology (arbitrary network), and works in anonymous networks, i.e., our algorithm uses no kind of assumption allowing distinguishing one or more nodes. Furthermore, it works under the weakest scheduling assumptions a.k.a, the unfair daemon. Built over any asynchronous self-stabilizing underlying unison U, our solution adds only O(log D) bits per node. Since there exists no unison algorithm with better space complexity, the extra space of our solution is negligible w.r.t. the space complexity of the underlying unison algorithm. Our algorithm provides a positive answer in O(max (k, k’, D)) time units, where k and k’ are the stabilization time complexities of A and U, respectively.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133185928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miao Cai, Junru Shen, Tianning Zhang, Hao Huang, Baoliu Ye
{"title":"SigGuard: Hardening Vulnerable Signal Handling in Commodity Operating Systems","authors":"Miao Cai, Junru Shen, Tianning Zhang, Hao Huang, Baoliu Ye","doi":"10.1109/SRDS55811.2022.00030","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00030","url":null,"abstract":"Signal is a useful mechanism provided by many commodity operating systems. However, current signal handling has serious security concerns due to vulnerable design in missing integrity protections for signal handling control flow. Security weaknesses caused by vulnerable design are exploited by adversaries to mount dangerous control-flow attacks. To tackle these issues, this paper investigates root causes of signal-related attacks and proposes SigGuard to harden vulnerable signal handling mechanism. To protect unsafe signal handler execution flow, we design a customized signal handler CFI framework which supports low-cost, reentrant, online CFI analysis and enforcement. To secure signal handler return control flow, we propose an efficient, software-based, intra-process memory isolation method to ensure signal frame data integrity. We evaluate SigGuard with both security and performance experiments. In security experiments, SigGuard successfully thwarts four signal-based attacks, including two proof-of-concept exploits and two realistic attacks conducted in Nginx and Apache server programs, respectively. We also evaluate SigGuard key techniques with a series of microbenchmarks and real-world applications. Experimental results suggest that key defense techniques used in SigGuard introduce reasonable performance costs.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"135 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132948986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Byzantine Auditable Atomic Register with Optimal Resilience","authors":"Antonella del Pozzo, A. Milani, Alexandre Rapetti","doi":"10.1109/SRDS55811.2022.00020","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00020","url":null,"abstract":"An auditable register extends the classical register with an audit operation that returns information on the read operations performed on the register. In this paper, we study Byzantine resilient auditable registers implementations in an asynchronous message-passing system. Existing solutions implement the auditable register on top of at least $4mathrm{f}+1$ servers, where at most $f$ can be Byzantine. We show that $4mathrm{f}+1$ servers are necessary to implement auditability without communication between servers. Then, we pursue the study by relaxing the constraint on the servers' communication, letting them interact with each other. In this setting, we prove that $3mathrm{f}+1$ servers are sufficient. This result establishes that with communication between servers, auditability does not come with an additional cost in terms of the number of servers.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131338462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"External Reviewers & Co-Reviewers","authors":"","doi":"10.1109/srds55811.2022.00009","DOIUrl":"https://doi.org/10.1109/srds55811.2022.00009","url":null,"abstract":"","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"69 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Liang, Xing Gao, Kun Sun, Wenjie Xiong, Haining Wang
{"title":"An Investigation on Data Center Cooling Systems Using FPGA-based Temperature Side Channels","authors":"Y. Liang, Xing Gao, Kun Sun, Wenjie Xiong, Haining Wang","doi":"10.1109/SRDS55811.2022.00015","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00015","url":null,"abstract":"As power and cooling cost has become a major factor in the total cost of ownership (TCO) of large-scale data centers, it is important to investigate how data centers run their cooling systems in practice. The data centers of Amazon Web Services (AWS) have been continuously expanding worldwide, and their restrictive security policies keep many management aspects of data centers private. In this paper, we make an attempt to explore the cooling systems of AWS data centers without privileged accesses. We first demonstrate PVT (process, voltage, and temperature) variations in AWS FPGAs (Field Programmable Gate Arrays) using time-digital converters (TDC). We further leverage the DRAM temperature side channel and improve the usage of the TDC to measure the temperature change accurately. We conduct a measurement on the daily temperatures of AWS data centers worldwide and find that temperature changes of some data centers are closely related to local weathers. Thus, we deduce they adopt free cooling techniques. This measurement study motivates us to re-think the vulnerability of data centers to power/thermal attacks.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116423528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weizhao Jin, B. Krishnamachari, Muhammad Naveed, Srivatsan Ravi, Eduard Sanou, Kwame-Lante Wright
{"title":"Secure Publish-Process-Subscribe System for Dispersed Computing","authors":"Weizhao Jin, B. Krishnamachari, Muhammad Naveed, Srivatsan Ravi, Eduard Sanou, Kwame-Lante Wright","doi":"10.1109/SRDS55811.2022.00016","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00016","url":null,"abstract":"Publish-subscribe protocols enable real-time multi-point-to-multi-point communications for many dispersed computing systems like Internet of Things (IoT) applications. Recent interest has focused on adding processing to such publish-subscribe protocols to enable computation over real-time streams such that the protocols can provide functionalities such as sensor fusion, compression, and other statistical analysis on raw sensor data. However, unlike pure publish-subscribe protocols, which can be easily deployed with end-to-end transport layer encryption, it is challenging to ensure security in such publish-process-subscribe protocols when the processing is carried out on an untrusted third party. In this work, we present $mathcal{XYZ}$, a secure publish-process-subscribe system that can preserve the confidentiality of computations and support multi-publisher-multi-subscriber settings. Within $mathcal{XYZ}$, we design two distinct schemes: the first using Yao's garbled circuits (the GC-Based Scheme) and the second using homomorphic encryption with proxy re-encryption (the Proxy-HE Scheme). We build implementations of the two schemes as an integrated publish-process-subscribe system. We evaluate our system on several functions and also demonstrate real-world applications. The evaluation shows that the GC-Based Scheme can finish most tasks two orders of magnitude times faster than the Proxy-HE Scheme while Proxy-HE can still securely complete tasks within an acceptable time for most functions but with a different security assumption and a simpler system structure.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114809494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In-Vivo Fuzz Testing for Network Services","authors":"Wen-Yang Lai, Kun-Che Tsai, Che Chen, Yu-Sung Wu","doi":"10.1109/SRDS55811.2022.00014","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00014","url":null,"abstract":"Fuzz testing is typically carried out by running the target program and the fuzzing engine offline in a lab environment. The environment setup may depend on specialized test harness code to activate the target program and inject the test data. Also, due to the vast program state space, domain knowledge-dependent optimization is often needed in the environment setup to achieve reasonably efficient fuzz testing. We propose In-Vivo Fuzzing to alleviate the burdens by performing online fuzz testing on live programs. In-Vivo Fuzzing hooks I/O library calls in a live program to collect test seeds. Upon request, the In-Vivo Runtime will create a fork of the target program and carry out fuzz testing on the forked process. The runtime states from the live program provide a vantage point to start the fuzzing process, and the test seeds collected from the live workload also facilitate the generation of effective test inputs. We applied In-Vivo Fuzzing to network service programs and implemented a prototype on top of the AFL fuzzer. Experiment results indicate that In-Vivo Fuzzing can reach vulnerabilities in real-world programs much more quickly than the baseline. We also demonstrate the potential application of In-Vivo Fuzzing in detecting unknown attacks, where live attack states are captured and amplified through fuzz testing.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130376799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Fault Trees with Correlated Failure Times - Modeling and Efficient Analysis -","authors":"P. Buchholz, A. Blume","doi":"10.1109/SRDS55811.2022.00027","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00027","url":null,"abstract":"Dynamic Fault Trees (DFTs) are a powerful and widely used class of models for reliability analysis of technical systems. They describe the relation between failure times of elementary components and failures of the system modeled by the DFT. Failure times of elementary components are assumed to be independent and often exponentially distributed. Then the underlying stochastic process is a Continuous Time Markov Chain (CTMC) which is often analyzed numerically. In this paper, we use phase type distributions to model failure times of elementary components and extend DFTs by introducing two new types of nodes to express different variants of correlation between failure times which often can be observed in real systems. Since the use of phase type distributions enlarges the state space of the CTMC, compositional techniques allowing a compact representation of the generator matrix and analysis techniques exploiting this compact representation are also introduced. In particular, analysis techniques are presented that exploit the specific structure of the DFT.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131047591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}