2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)最新文献_第3页

Controlling Cascading Failures in Interdependent Networks under Incomplete Knowledge 不完全知识下相互依赖网络的级联故障控制

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.14

D. Z. Tootaghaj, N. Bartolini, Hana Khamfroush, T. L. Porta

{"title":"Controlling Cascading Failures in Interdependent Networks under Incomplete Knowledge","authors":"D. Z. Tootaghaj, N. Bartolini, Hana Khamfroush, T. L. Porta","doi":"10.1109/SRDS.2017.14","DOIUrl":"https://doi.org/10.1109/SRDS.2017.14","url":null,"abstract":"Vulnerability due to inter-connectivity of multiple networks has been observed in many complex networks. Previous works mainly focused on robust network design and on recovery strategies after sporadic or massive failures in the case of complete knowledge of failure location. We focus on cascading failures involving the power grid and its communication network with consequent imprecision in damage assessment. We tackle the problem of mitigating the ongoing cascading failure and providing a recovery strategy. We propose a failure mitigation strategy in two steps: 1) Once a cascading failure is detected, we limit further propagation by re-distributing the generator and load's power. 2) We formulate a recovery plan to maximize the total amount of power delivered to the demand loads during the recovery intervention. Our approach to cope with insufficient knowledge of damage locations is based on the use of a new algorithm to determine consistent failure sets (CFS). We show that, given knowledge of the system state before the disruption, the CFS algorithm can find all consistent sets of unknown failures in polynomial time provided that, each connected component of the disrupted graph has at least one line whose failure status is known to the controller.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"22 1","pages":"54-63"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74633078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Runtime Measurement Architecture for Bytecode Integrity in JVM-Based Cloud 基于jvm的云中字节码完整性的运行时度量体系结构

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.39

Haihe Ba, Huaizhe Zhou, Jiangchun Ren, Zhiying Wang

引用次数: 2

DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones DottedDB:反熵没有默克尔树，删除没有墓碑

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.28

Ricardo Gonçalves, Paulo Sérgio Almeida, Carlos Baquero, V. Fonte

{"title":"DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones","authors":"Ricardo Gonçalves, Paulo Sérgio Almeida, Carlos Baquero, V. Fonte","doi":"10.1109/SRDS.2017.28","DOIUrl":"https://doi.org/10.1109/SRDS.2017.28","url":null,"abstract":"To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"35 1","pages":"194-203"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83623659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Reconfiguring Parallel State Machine Replication 重新配置并行状态机复制

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.23

E. Alchieri, F. Dotti, O. Mendizabal, F. Pedone

引用次数: 22

Hybrid-RC: Flexible Erasure Codes with Optimized Recovery Performance and Low Storage Overhead 混合- rc:灵活的Erasure代码与优化的恢复性能和低存储开销

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.17

Liuqing Ye, D. Feng, Yuchong Hu, Qing Liu

{"title":"Hybrid-RC: Flexible Erasure Codes with Optimized Recovery Performance and Low Storage Overhead","authors":"Liuqing Ye, D. Feng, Yuchong Hu, Qing Liu","doi":"10.1109/SRDS.2017.17","DOIUrl":"https://doi.org/10.1109/SRDS.2017.17","url":null,"abstract":"Erasure codes are widely used in practical storage systems to prevent disk failure and data loss. However, these codes require excessive disk I/Os and network traffic for recovering unavailable data. As a result, the recovery performance of erasure codes is suboptimal. Among all erasure codes, Minimum Storage Regenerating (MSR) codes can achieve optimal repair bandwidth under the minimum storage during recovery, but some open issues remain to be addressed before applying them in real systems. In this paper, we present Hybrid Regenerating Codes (Hybrid-RC), a new set of erasure codes with optimized recovery performance and low storage overhead. The codes utilize the superiority of MSR codes to compute a subset of data blocks while some other parity blocks are used for reliability maintenance. As a result, our design is near-optimal with respect to storage and network traffic. We show that Hybrid-RC reduces the reconstruction cost by up to 21% compared to the Local Reconstruction Codes (LRC) with the same storage overhead. Most importantly, in Hybrid-RC, each block contributes only half the amount of data when processing a single block failure. Therefore, the number of I/Os consumed per block is reduced by 50%, which is of great help to balance the network load and reduce the latency.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"41 1","pages":"124-133"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75437838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Horizontally Scalable and Reliable Architecture for Location-Based Publish-Subscribe 基于位置的发布-订阅水平可扩展和可靠的体系结构

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.16

B. Chapuis, B. Garbinato, Lucas Mourot

{"title":"A Horizontally Scalable and Reliable Architecture for Location-Based Publish-Subscribe","authors":"B. Chapuis, B. Garbinato, Lucas Mourot","doi":"10.1109/SRDS.2017.16","DOIUrl":"https://doi.org/10.1109/SRDS.2017.16","url":null,"abstract":"With billions of connected users and objects, location-based services face a massive scalability challenge. We propose a horizontally-scalable and reliable location-based publish/subscribe architecture that can be deployed on a cluster made of commodity hardware. As many modern location-based publish/subscribe systems, our architecture supports moving publishers, as well as moving subscribers. When a publication moves in the range of a subscription, the owner of this subscription is instantly notified via a server-initiated event, usually in the form of a push notification. To achieve this, most existing solutions rely on classic indexing data structures, such as R-trees, and they struggle at scaling beyond the scope of a single computing unit. Our architecture introduces a multi-step routing mechanism that, to achieve horizontal scalability, efficiently combines range partitioning, consistent hashing and a min-wise hashing agreement. In case of node failure, an active replication strategy ensures a reliable delivery of publication throughout the multistep routing mechanism. From an algorithmic perspective, we show that the number of messages required to compute a match is optimal in the execution model we consider and that the number of routing steps is constant. Using experimental results, we show that our method achieves high throughput, low latency and scales horizontally. For example, with a cluster made of 200~nodes, our architecture can process up to 190'000 location updates per second for a fleet of nearly 1'900'000 moving entities, producing more than 130'000 matches per second.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"36 1","pages":"74-83"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77724410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

An End-To-End Log Management Framework for Distributed Systems 分布式系统的端到端日志管理框架

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.41

Pinjia He

引用次数: 5

Robust Multi-Resource Allocation with Demand Uncertainties in Cloud Scheduler 考虑需求不确定性的云调度鲁棒多资源分配

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.12

Jianguo Yao, Q. Lu, H. Jacobsen, Haibing Guan

{"title":"Robust Multi-Resource Allocation with Demand Uncertainties in Cloud Scheduler","authors":"Jianguo Yao, Q. Lu, H. Jacobsen, Haibing Guan","doi":"10.1109/SRDS.2017.12","DOIUrl":"https://doi.org/10.1109/SRDS.2017.12","url":null,"abstract":"Cloud scheduler manages multi-resources (e.g., CPU, GPU, memory, storage etc.) in cloud platform to improve resource utilization and achieve cost-efficiency for cloud providers. The optimal allocation for multi-resources has become a key technique in cloud computing and attracted more and more researchers' attentions. The existing multi-resource allocation methods are developed based on a condition that the job has constant demands for multi-resources. However, these methods may not apply in a real cloud scheduler due to the dynamic resource demands in jobs' execution. In this paper, we study a robust multi-resource allocation problem with uncertainties brought by varying resource demands. To this end, the cost function is chosen as either of two multi-resource efficiency-fairness metrics called Fairness on Dominant Shares and Generalized Fairness on Jobs, and we model the resource demand uncertainties through three typical models, i.e., scenario demand uncertainty, box demand uncertainty and ellipsoidal demand uncertainty. By solving an optimization problem we get the solution for robust multi-resource allocation with uncertainties for cloud scheduler. The extensive simulations show that the proposed approach can handle the resource demand uncertainties and the cloud scheduler runs in an optimized and robust manner.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"39 1","pages":"34-43"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88166818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Incremental Elasticity for NoSQL Data Stores NoSQL数据存储的增量弹性

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-09-01 DOI: 10.1109/SRDS.2017.26

Antonis Papaioannou, K. Magoutis

引用次数: 1

On the Robustness of a Neural Network 神经网络的鲁棒性

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) Pub Date : 2017-07-25 DOI: 10.1109/SRDS.2017.21

El Mahdi El Mhamdi, R. Guerraoui, Sébastien Rouault

{"title":"On the Robustness of a Neural Network","authors":"El Mahdi El Mhamdi, R. Guerraoui, Sébastien Rouault","doi":"10.1109/SRDS.2017.21","DOIUrl":"https://doi.org/10.1109/SRDS.2017.21","url":null,"abstract":"With the development of neural networks based machine learning and their usage in mission critical applications, voices are rising against the black box aspect of neural networks as it becomes crucial to understand their limits and capabilities. With the rise of neuromorphic hardware, it is even more critical to understand how a neural network, as a distributed system, tolerates the failures of its computing nodes, neurons, and its communication channels, synapses. Experimentally assessing the robustness of neural networks involves the quixotic venture of testing all the possible failures, on all the possible inputs, which ultimately hits a combinatorial explosion for the first, and the impossibility to gather all the possible inputs for the second.In this paper, we prove an upper bound on the expected error of the output when a subset of neurons crashes. This bound involves dependencies on the network parameters that can be seen as being too pessimistic in the average case. It involves a polynomial dependency on the Lipschitz coefficient of the neurons' activation function, and an exponential dependency on the depth of the layer where a failure occurs. We back up our theoretical results with experiments illustrating the extent to which our prediction matches the dependencies between the network parameters and robustness. Our results show that the robustness of neural networks to the average crash can be estimated without the need to neither test the network on all failure configurations, nor access the training set used to train the network, both of which are practically impossible requirements.","PeriodicalId":6475,"journal":{"name":"2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)","volume":"13 1","pages":"84-93"},"PeriodicalIF":0.0,"publicationDate":"2017-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73988540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19